Am I the only one who knows what "strongly-typed" even means anymore???



  • My compiler translates x == y to x.equals(y), which has the fun side effect of crashing if the left-hand side of the comparison is null.



  • @blakeyrat said:

    There are operations that are valid on character types. For example, you can attempt to capitalize one. You can compare one to another to find which should sort first.

    You can't add or subtract them, though, that doesn't make any fucking sense.

    Is alphabetic ordering not a thing in your world?

    Characters are members of an alphabet. You might not like the fact that there is generally no explicit way to define which alphabet, so you can't tell up front whether the alphabet is Unicode or ASCII or EBCDIC, but there it is.

    Membership in an alphabet implies a positioning relative to other members of the same alphabet. So in fact it's completely natural to be able to subtract one character from another. The result of that should not, however, be another character; it should be the alphabetic distance between them, as an integer.

    Likewise, it should be possible to add an integer to a character, and the result should be a character.

    I can think of no other useful arithmetic operations that ought to be applicable to characters though. If you were to claim that allowing two characters to be added is pants-on-head retarded, I wouldn't disagree.



  • @xaade said:

    Do you think non-strongly-typed languages are bad?

    Not really. Depends on what you're using them for. It's probably not a good idea to write something like, say, TurboTax's calculation engine in a weakly-typed language, but it's fine for TurboTax's web UI to be in one.

    @flabdablet said:

    Is alphabetic ordering not a thing in your world?

    Of course it is. What the holy fuck does that have to do with adding or subtracting characters?

    @flabdablet said:

    Characters are members of an alphabet.

    Ok... where are you going with this...

    @flabdablet said:

    Membership in an alphabet implies a positioning relative to other members of the same alphabet.

    Ok...

    @flabdablet said:

    So in fact it's completely natural to be able to subtract one character from another.

    Nope.

    Look. US English. You have, what, 30 or so characters with explicit order, and something like 40,000 that do not have any order in this language. Those characters still exist, but they can't be ordered in any meaningful fashion without going outside the rules of US English.

    You started out talking about alphabetization, basically a sorting problem. Then you went on about "calculating distance between characters". Does one even lead to the other? You can sort a heck of a lot of things without them having a set distance. For example, the length of various routes to a destination. The route to C is longer than the route to B which is longer than the route to A-- does that mean the route to C is the route to A multiplied by three? Of course not, that's stupid. Yet they can still be sorted.

    @flabdablet said:

    Likewise, it should be possible to add an integer to a character, and the result should be a character.

    No, it should not.

    Characters are not numbers. It makes no sense to add or subtract them.

    @flabdablet said:

    If you were to claim that allowing two characters to be added is pants-on-head retarded, I wouldn't disagree.

    THAT IS WHAT HE CODE SAMPLE IS DOING JESUS FUCK WHAT IS FUCKING WRONG WITH YOU DUMBSHIT MORONS DID YOU NOT EVEN ATTEMPT TO READ THE FUCKING FIRST POST?!

    Why do I post here? It's just dealing with this kind of garbage all the time. Goddamned.



  • Okay, so Nothing in Cool is like void in C. You can't look at it and you can't store it in a variable.

    Diverging is more like C's __attribute__((noreturn)).


    What if you renamed your character type to rune and declared that it represents a Unicode codepoint? You can do range comparisons on Unicode codepoints, and you can sort Unicode codepoints, and it follows by way of ease of implementation that you should be able to do some limited math on runes.



  • @riking said:

    Okay, so Nothing in Cool is like void in C.

    No, Unit in Cool is like void in C. C doesn't have a specific type for use by functions like abort that can be used as the last command in a function that returns any arbitrary type.

    You can make Nothing variables and function arguments, but as of Cool 2015, you can't make Nothing fields in a class.



  • @riking said:

    What if you renamed your character type to rune

    What if we renamed our int32 type to int32 and then autocorrected people who misspell that as rune?



  • @blakeyrat said:

    You started out talking about alphabetization, basically a sorting problem. Then you went on about "calculating distance between characters". Does one even lead to the other?

    Sure, as long as you understand that the distance is the distance in the ordering.

    @blakeyrat said:

    The route to C is longer than the route to B which is longer than the route to A-- does that mean the route to C is the route to A multiplied by three? Of course not, that's stupid.

    Because you changed the subject to multiplication, which nobody has claimed makes sense in this context. (You could give it a meaning in the way you describe, but this doesn't seem like it would be a particularly useful thing to do.) It does make sense to say that the route to C is two ranks further down in the list of routes ordered by size.

    @blakeyrat said:

    @flabdablet said:
    If you were to claim that allowing two characters to be added is pants-on-head retarded, I wouldn't disagree.

    THAT IS WHAT HE CODE SAMPLE IS DOING

    No, it isn't. It's subtracting two characters, not adding them; and yes, the distinction is important.

    The situation with characters is similar to the situation with dates. It makes sense to subtract two dates, with the result being not a date but an interval (which we can represent as a number of some smaller unit). This is essentially what a unix timestamp is, for one example. Similarly, you can take a date and add or subtract an interval from it and you'll get another date. But adding two dates makes no sense.

    In the same way, you can take two characters in some defined collation and subtract them to find a number representing how far apart they are in that collation. You can also take a character and add or subtract an integer to say things like "give me the character that's three later than this one in this collation". These are sometimes useful things to do (rot13 for instance, or any other simple cipher). But you can't add a character to another character in any meaningful sense.


  • Discourse touched me in a no-no place

    @flabdablet said:

    You might not like the fact that there is generally no explicit way to define which alphabet, so you can't tell up front whether the alphabet is Unicode or ASCII or EBCDIC, but there it is.

    It's very common now to define that characters are from Unicode. That deals with a vast amount of shit in this area; e.g., not all alphabets define an ordering of their characters, but Unicode does. Unicode itself is a steaming pile of WTF but it's much better than what it replaces (especially when serialised as UTF-8, though it is important to understand that that's just a serialisation of Unicode).



  • @blakeyrat said:

    Ok; and what does it mean to subtract a character from a character?

    Obviously, the answer to this question will be RTFM when they get around to writing TFM.

    Using existing operators to define new operations on non-numeric types is nothing new. Some languages even define what it means to add strings together, for example.



  • Sometimes I.....

    If you can do Date calculations, why not String? After all, a character is a physical (in one incarnation) representation of a number. If 2015-08-28 + 5 Days is returned as 2015-09-02, why can't f + 5 Chars return k? Or a + 11 Words return absolutely (for a given dictionary)?

    How else do you expect a sort, of any type, to be performed? By spreading them out on a table (preferably wooden for best results), and doing it by "eye"?

    Sheesh...

    PRE-EMPTIVE_PEDANT_EDIT: I didn't say that current sorting is perfect.



  • @ben_lubar said:

    So what's the implementation of the "alphabetical order"? Maybe we can define it to be int compare(char a, char b) { return (int)(a - b); }

    You can if you want your application to sort things in the wrong order. I wouldn't want to do that, but you might.

    It doesn't work even for English correctly ('first' and 'First' should both come before 'Second') and it fails completely when you look at any other language.

    Using unicode code points directly messes up the order of 'ä' and 'ö' in my native language, and that's only the first problem. Some of the letters with diacritics should be combined together and others not.

    For example in the standard sorting order for it:

    A = a
    A = á
    â = a
    a < å
    å < ä

    Then there's the historical convention that's nowadays mostly ignored where V and W are both considered to be the same letter even though they are not interchangeable in text. This oddity sprung from the 19th century convention where texts printed in Roman fonts used V exclusively, Fraktur fonts used only W, and handwritten text used W in the initial position and V elsewhere.

    In short: when you compare characters with each other, you have to do that using the conventions of language that you are using. Otherwise you are wrong.



  • How about the classic '4' - '0' = 4?

    Edit: In fact thinking of it, arithmetic operations between char and int should be the same as between int and pointer: You can subtract two chars and get and int as result, add/subtract an int to a char, but not add two chars together.

    Edit2: Whoops, it appears @flabdablet said it all before me.



  • Things I can agree with:

    • Removing implicit int-char casts
    • Removing comparison operators from chars and instead make the programmer consciously choose locale-dependent alphabetic comparison
    • .ToUnicodeCodepoint()
    • Subtracting characters from each other is a load of bullshit. Subtracting codepoints is not.


  • From what I've read somewhere (probably some documentation of strxfrm()), locale-based comparisons actually transform the string first. Such as a comparison of "Møøse" to "moose" would actually compare something like "moose#Ulllll#_//__" to something like "moose#lllll#_____". Case-insensitive, accent-insensitive comparisons would then ignore the rest.

    Edit: Of course, the actual result is something much more arcane. Here is the result of my test on Windows: I called wcsxfrm() with the locales "French" and "English" on the three strings "Møøse", "moose" and "MOOSE", and wrote them on separate lines in a UTF-16 text file. The result is a mess of control characters, looking like this in hex:

    FF FE
    0E 00 51 00 0E 00 7C 00 0E 00 7C 00 0E 00 91 00 0E 00 21 00 01 00 02 00 02 00 21 00 21 00 01 00 12 00 01 00 01 00 
    0D 00 0A 00
    0E 00 51 00 0E 00 7C 00 0E 00 7C 00 0E 00 91 00 0E 00 21 00 01 00 01 00 01 00 01 00
    0D 00 0A 00
    0E 00 51 00 0E 00 7C 00 0E 00 7C 00 0E 00 91 00 0E 00 21 00 01 00 01 00 12 00 12 00 12 00 12 00 12 00 01 00 01 00
    0D 00 0A 00
    

    There isn't any character left untransformed!

    Note that in the default locale ("C"), the strings are not transformed at all.


  • Banned

    Has anyone noticed that @blakeyrat's complaint doesn't have anything to do with strong typing and he's simply discontent with the set of operations defined for char type?



  • Oh cool, sort keys.



  • [quote="blakeyrat, post:53, topic:50820]

    @flabdablet said:

    So in fact it's completely natural to be able to subtract one character from another.

    Nope.

    Look. US English. You have, what, 30 or so characters with explicit order, and something like 40,000 that do not have any order in this language. Those characters still exist, but they can't be ordered in any meaningful fashion without going outside the rules of US English.
    [/quote]

    I'm guessing that the 30 or so characters with explicit order that you're talking about, in the context of US English, would be the letters? If not, please ignore all reasoning based on that guess.

    US English letters are members of an alphabet which contains 26 of them in a well defined order, and that ordering gives rise to the concept of alphabetic distance. The letter 'E' is fifth in this alphabet, and the letter 'M' is thirteenth, so the alphabetic distance from 'E' to 'M' is 8.

    The notion of alphabetic distance has enough in common with numeric subtraction to make the use of the - operator a natural way to express it: 'M' - 'E' == 8. The use of the + operator for applying an alphabetic distance to a letter, thereby calculating another letter, is equally natural: 'E' + 8 == 'M'.

    Note particularly that the rules of alphabetic distance arithmetic allow for only the following operations:

    • adding an alphabetic distance (which is an integer) to a letter, resulting in another letter
    • subtracting an alphabetic distance (integer) from a letter, resulting in another letter
    • subtracting one letter from another, resulting in an alphabetic distance (integer).

    Also note that not all combinations of letters and distances are meaningful. For example, though 'M' + 1 == 'N', 'Z' + 1 is not well defined. One reasonable and convenient extension to the arithmetic would be to treat the alphabet as a ring rather than a list: do that and 'Z' + 1 would become 'A', allowing function rot13(a: letter): letter to be implemented simply as return a + 13.

    In the context of a general purpose programming language as opposed to US English, it's completely reasonable to generalize the concept of letters to that of characters, and to treat those as members of an extended alphabet that does completely specify their ordering (perhaps even in a ring), thereby yielding a well-defined alphabetic distance between any pair.

    One way to implement this is to define a bitwise encoding for each character, then interpret those codes as numeric when performing character arithmetic. This works quite tidily for encodings like ASCII that are laid out in such a way that subset alphabets such as the letters and the numeric digits occupy contiguous positions within the larger alphabet. Using it with other encodings like EBCDIC yields odd results like the character alphabetic distance 'J' - 'I' being 8 though the corresponding letter alphabetic distance is 1.

    Unicode is largely unproblematic in this respect, as the various subset alphabets it includes are, by and large, allocated contiguous ranges of code points. It also contains large ranges of code points representing punctuation and ideograms that are not members of any subset alphabet, and it's completely reasonable to observe that the concept of alphabetic distance as applied to those is not likely to be enormously useful. But that doesn't alter the simple fact that those characters, simply by virtue of having been chosen as representable in Unicode, become parts of the huge Unicode alphabet and thereby have a corresponding character ordering assigned to them.

    Note that simply re-using the bitwise integer representations of Unicode code points as the bitwise representations of Unicode characters does not automatically make overflow-ignored unsigned integer arithmetic on those encodings equivalent to ring-alphabetic distance arithmetic, as the size of the Unicode alphabet is not a power of 2.

    [quote="blakeyrat, post:53, topic:50820]
    You started out talking about alphabetization, basically a sorting problem. Then you went on about "calculating distance between characters". Does one even lead to the other? You can sort a heck of a lot of things without them having a set distance. For example, the length of various routes to a destination. The route to C is longer than the route to B which is longer than the route to A-- does that mean the route to C is the route to A multiplied by three? Of course not, that's stupid. Yet they can still be sorted.
    [/quote]

    Of course they can, because distances are measures, and measures can be approximated with numbers, and numbers have well-defined orderings. You don't even need any special addition or subtraction typing rules for that job: the difference between any two distances is itself a distance.

    Alphabetic distance is an abstraction related to, but not identical to, physical distance.

    [quote="blakeyrat, post:53, topic:50820]
    @flabdablet said:

    Likewise, it should be possible to add an integer to a character, and the result should be a character.

    No, it should not.

    Characters are not numbers. It makes no sense to add or subtract them.
    [/quote]

    You keep saying that, but you provide no supporting reasoning.

    It certainly makes no sense to add one character to another, and it makes no sense to expect the result of subtracting one character from another to be a character. But there is an easily defined arithmetic involving characters and integer alphabetic distances, as sketched out above, that makes perfect sense.

    [quote="blakeyrat, post:53, topic:50820]
    @flabdablet said:

    If you were to claim that allowing two characters to be added is pants-on-head retarded, I wouldn't disagree.

    THAT IS WHAT HE CODE SAMPLE IS DOING JESUS FUCK WHAT IS FUCKING WRONG WITH YOU DUMBSHIT MORONS DID YOU NOT EVEN ATTEMPT TO READ THE FUCKING FIRST POST?!
    [/quote]

    No, the code sample is subtracting one character from another, not adding two characters.

    [quote="blakeyrat, post:53, topic:50820]
    Why do I post here? It's just dealing with this kind of garbage all the time. Goddamned.
    [/quote]

    My current belief is that you derive some kind of masochistic pleasure from being endlessly shown to be wrong.


  • ♿ (Parody)

    @flabdablet said:

    abstraction

    Man...I didn't want to bring something like that into this discussion, because some people just cannot handle it. Good luck.



  • It takes all types.



  • @flabdablet said:

    US English letters are members of an alphabet which contains 26 of them in a well defined order, and that ordering gives rise to the concept of alphabetic distance. The letter 'E' is fifth in this alphabet, and the letter 'M' is thirteenth, so the alphabetic distance from 'E' to 'M' is 8.

    What is the result of 'B' - 'a'?

    27?
    1?
    -31?

    Abusing operator overloads has been a thing for a long time. It's almost always a bad idea to overload an operator unless you have a very robust set of rules. Since math on characters requires information about what language and sensitivity (case, kana, etc) you want, it would only make sense on a type that has the ability to define that information (which char does not), or by using functions which require that information to be passed.



  • @Scarlet_Manuka said:

    No, it isn't. It's subtracting two characters, not adding them; and yes, the distinction is important.

    If you implement a type and you end up with a - b not equal to a + (-b), then you probably shouldn't have used operator overloading. And, yes, the common implementation of string concatenation using the addition operator is poor design. I hate to say it, but PHP and VB got it right by defining a string concatenation operator.



  • @Jaime said:

    What is the result of 'B' - 'a'?

    If those single-quoted literals are intended to represent Unicode characters: 33.

    If English letters: 1.

    There are two several subset English letters alphabets embedded in Unicode.



  • @flabdablet said:

    There are two subset English letters alphabets embedded in Unicode.

    what about this AND THIS?


  • Considered Harmful

    If using subtraction, yes, the comparison function needs to stick the return value, else it is vaguely possible that overflow could confuse the results. No client should ever depend on this having happened, clients should look only at the sign of the result.


  • Considered Harmful

    What, you never heard of duck typing?



  • Wow, thanks. In fact, it's probably in one of these posts (Edit: Most likely this one) that I read it in the first place.



  • @Jaime said:

    If you implement a type and you end up with a - b not equal to a + (-b), then you probably shouldn't have used operator overloading.

    Alphabetic distance arithmetic does not define subtracting characters from anything but other characters, so it doesn't define subtracting a character from zero, so it can't define a unary negation operation for characters. Therefore, the second expression is permissible only if b is a distance, not a character; and in that case the two expressions do yield identical results.


  • FoxDev

    @Gribnit said:

    What, you never heard of duck typing?

    If it looks like a duck, and quacks like a duck, it's probably just a cast member of Duck Dynasty trying out a new duckblind/duckcall combination.



  • @blakeyrat said:

    @ben_lubar said:
    c is a character and they subtracted a character

    Ok; and what does it mean to subtract a character from a character?

    I want to subtract @ from }. What's the result? Why?

    You get the result 0x3D (61 decimal), because you're subtracting unicode points. Specifically 0x007D - 0x0040

    In fact, what language is there where char isn't a number type? Hint: If you immediately thought of C#, you're wrong.


  • Grade A Premium Asshole

    I couldn't look at this thread title anymore without making this meme:


  • Banned

    @powerlord said:

    In fact, what language is there where char isn't a number type?

    Rust.



  • C# gets a lot of this stuff wrong, starting with defining a character as 16 bits.

    A character, a code point, and an encoding are three different things. Any language that implicitly casts from a character to an encoding is doing it wrong.



  • @overpaid_consultant said:

    In short: when you compare characters with each other, you have to do that using the conventions of language that you are using. Otherwise you are wrong.

    Quite so. And when defining a character type for use in a general purpose computer language, you want those conventions to strike a nice balance between simplicity and generality. That way, when you need to deal with stuff like locales and accents and multiple cases you can implement all of that on top of the underlying character type without needing to fight with it too much.



  • @flabdablet said:

    If those single-quoted literals are intended to represent Unicode characters: 33.

    Really... my Unicode chart says it's -31.

    Using Unicode character codes for ordering put lowercase after uppercase in US English, puts all accented characters after all non-accented ASCII-representable characters, and does not allow for two characters to be zero distance from each other.

    In other words, it defines an semi-arbitrary order that is of minimal usefulness.



  • @NedFodder said:

    Any language that implicitly casts from a character to an encoding is doing it wrong.

    I completely agree.

    I also claim that requiring such casts to be made explicitly when working with alphabetic distances is both ugly and unnecessary.



  • @powerlord said:

    In fact, what language is there where char isn't a number type?

    This discussion is about a new language. Are you suggesting that char being a number is such an untouchable principal that discussing any alternative is not allowed? Or maybe that you already know that char will be implemented as a number in all future languages and you are just arguing backwards from that pre-ordained conclusion?


  • 🚽 Regular

    @Gaska said:

    @powerlord said:
    In fact, what language is there where char isn't a number type?

    Rust.

    JavaScript!

    > typeof ('a'.charAt(0))
    "string"
    

    🐠



  • @Jaime said:

    my Unicode chart says it's -31.

    Quite right. I misread the expression as 'b' - 'A'.

    @Jaime said:

    semi-arbitrary order that is of minimal usefulness

    The Unicode alphabetic distances between characters that do not share a common sub-alphabet are indeed not often of much use. Neither are those between ideograms or other punctuation (I believe I touched on this above). However, if the language allows alphabetic distance arithmetic to be applied clearly and concisely to the fundamental character type, it can be cleanly used as a basis for all the other manipulations you might care to do.

    The most useful of the cross-sub-alphabet distance calculations will be things like 'a' - 'A' for establishing mapping rules between sub-alphabets. Note that the sum or difference of alphabetic distances is itself an alphabetic distance.

    As noted above, alphabetic distance arithmetic works a lot like C pointer arithmetic and timestamp/interval arithmetic, so it's not fundamentally different from stuff that any competent programmer should already understand. In the context of a strongly typed language, it makes sense to me that alphabetic distances should get their own type. Requiring alphabetic distance calculations to be done using explicit casting of code points to general integers looks like a missed opportunity to me.



  • @flabdablet said:

    In the context of a strongly typed language, it makes sense to me that alphabetic distances should get their own type. Requiring alphabetic distance calculations to be done using explicit casting of code points to general integers looks like a missed opportunity to me.

    So would it make sense to require an explicit cast between the alphabetic distance itself and integers?

    Of course, in a very strongly-typed language, I'm not even sure "raw" integers would exist, rather than all be some unit (or unitless factor) or other.


  • FoxDev

    @NedFodder said:

    C# gets a lot of this stuff wrong, starting with defining a character as 16 bits.

    My understanding is there's two reasons for that:

    1. C#'s char representation was chosen before characters above 0xFFFF were part of the Unicode standard (I might be wrong about this, but it's kinda irrelevant anyway, because of reason 2)
    2. The type of Unicode used internally by Windows is UTF-16 (or UCS-2, I forget exactly which), and having .NET using the same makes the managed/unmanaged transitions easier

    @NedFodder said:

    A character, a code point, and an encoding are three different things.

    True, but you can't store a character without having some form of encoding, otherwise, how do you interpret the bits and bytes?

    @NedFodder said:

    Any language that implicitly casts from a character to an encoding is doing it wrong.

    Also true... and I don't really have anything to add to this bit, so I'm just going to type until I get bored.


  • Banned

    @RaceProUK said:

    The type of Unicode used internally by Windows is UTF-16 (or UCS-2, I forget exactly which)

    It depends.

    🚎

    Edit: @discoursebot, look at whom I'm replying to!



  • @Medinoc said:

    So would it make sense to require an explicit cast between the alphabetic distance itself and integers?

    If by "integers" you mean "other integer types" then yes.

    The point of strong typing is to make genuine mistakes more likely to be caught at compile time, not to force the programmer to write endless boilerplate to get useful work done. Requiring casts and pseudo-casts like .toUnicodeCodePoint() just adds noise. The type system should reflect what's actually being worked with.



  • @RaceProUK said:

    @NedFodder said:
    C# gets a lot of this stuff wrong, starting with defining a character as 16 bits.

    My understanding is there's two reasons for that:

    1. C#'s char representation was chosen before characters above 0xFFFF were part of the Unicode standard (I might be wrong about this, but it's kinda irrelevant anyway, because of reason 2)
    2. The type of Unicode used internally by Windows is UTF-16 (or UCS-2, I forget exactly which), and having .NET using the same makes the managed/unmanaged transitions easier

    Windows made the decision first, C# repeated the mistake. Oh well, we're stuck with it. To echo blakey: languages designed in 2015 should know better.

    @RaceProUK said:

    @NedFodder said:
    A character, a code point, and an encoding are three different things.

    True, but you can't store a character without having some form of encoding, otherwise, how do you interpret the bits and bytes?

    To echo blakey: implementation detail. Provide a library that does all that crap behind the scenes.



  • Windows has used UCS-2 since Windows NT was being developed in 1990 (based on early drafts of Unicode). It's recently been amended to be UTF-16 LE, but without any special treatment for surrogate pairs; if they split, they split, and turn into replacement characters.


  • FoxDev

    @NedFodder said:

    To echo blakey: languages designed in 2015 should know better.

    C# arrived in 2000 [/pendant]

    @NedFodder said:

    Provide a library that does all that crap behind the scenes.

    For the most part, character encoding issues only really become significant when it comes to files and streams. OK, there's a lot of files and streams, but still. And any decent framework should make it as easy as specifying an encoding as a method parameter.



  • In this case, char is a code point. If you want something with an associated character set, you have string.



  • If you are manipulating characters as numbers, you are almost certainly doing something that will not globalize well. There also is almost certainly an alternative method to do what you are trying to do that is both easier to use and globalizes better. Providing an implicit cast to int conveys the idea that this is something that should be done more than once in a blue moon. The boilerplate you are referring to will very rarely be used.



  • @NedFodder said:

    C# gets a lot of this stuff wrong, starting with defining a character as 16 bits.

    It is kind of amusing how Javascript, which has the slackest typing system on the planet, manages not to cause trouble with surrogates.



  • I agree with everything you say, and note that supporting alphabetic distance arithmetic is not the same thing as providing an implicit cast to int.



  • @powerlord said:

    char is a code point

    Yes. Don't manipulate it as if it's a number or a collection of bits.


Log in to reply