Am I the only one who knows what "strongly-typed" even means anymore???



  • Again, you're missing the point that not all sorting is done for human consumption.

    Having an unambiguous sort order for the language's fundamental character type does not preclude, and may even aid, implementing the kind of locale-specific sorting rules you're talking about. But that's not its only use.



  • @flabdablet said:

    Again, you're missing the point that not all sorting is done for human consumption.

    They're characters; the only reason they exist is to represent things that are relevant to humans. If you want to sort the encoded value, convert it to something other than a character where that makes sense.



  • You could make exactly the same claim about numbers, and it would be equally vacuous.


  • Banned

    @Jaime said:

    They're characters; the only reason they exist is to represent things that are relevant to humans. If you want to sort the encoded value, convert it to something other than a character where that makes sense.

    You forget that sorting is often done on human-consumable data for non-human-consumable purposes. For example, dictionary data structure.


  • FoxDev

    @Jaime said:

    They're characters; the only reason they exist is to represent things that are relevant to humans.

    What about text-based protocols?

    GET /index.html HTTP/1.1
    Host: www.example.com
    


  • @Gaska said:

    You forget that sorting is often done on human-consumable data for non-human-consumable purposes. For example, dictionary data structure.

    We can argue that they are not ordered by any human-consumable properties but as numbers, so that the character codepoints are a kind of hash value, which per definitionem is numerical without any significance but making elements easier sortable and trackable.


  • Banned

    @RaceProUK said:

    What about text-based protocols?

    They were designed as being comfortable for humans to read and write by hand. It's stupid and no one does this, but it's still in line with what @Jaime's said.

    @PWolff said:

    We can argue that they are not ordered by any human-consumable properties but as numbers, so that the character codepoints are a kind of hash value, which per definitionem is numerical without any significance but making elements easier sortable and trackable.

    Or we can argue that inhumane collation rules for humane data makes sense sometimes and we don't need another data type for it.



  • For sake of clarity (type safety), I'd prefer to explicitly cast them to numbers, though, even if the compiler simply doesn't insert any operation there.


    @Gaska said:

    inhumane collation rules for humane data
    QFT (scnr)


    Edit: The remark under the quote was in normal size, in spite of being wrapped with levels of <small> tags, just as Discoursistency and WYSIWTF require.


  • Banned

    I love how putting quote in <small> tags makes the quote, avatar and the buttons on the right small too.


  • Banned

    @PWolff said:

    For sake of clarity (type safety)

    This has nothing to do with type safety. Clarity, maybe. But type safety is completely orthogonal to the issue if there should be access to character's numerical representation or not.

    Also - C had a reason to allow arbitrary math on char types: it doubles as designated type for byte-sized integer. Dunno if it's relevant to any of the 200 posts above, but whatever.



  • @Gaska said:

    This has nothing to do with type safety

    You're right - being comparable is sufficient, independent of the way of implementation.

    @Gaska said:

    Also - C had a reason to allow arbitrary math on char types: it doubles as designated type for byte-sized integer. Dunno if it's relevant to any of the 200 posts above, but whatever.

    I'm not sure how you mean this. I don't want to start a blakeyrant1 here about relevance of the first post of a topic and all that follows from that in general and Koka's C-like way of treating characters in particular. And maybe saying something about Kernighan and Ritchie still pondering on their first book about C so they can publish it in 2016.


    Edit: note added

    1 my personal impression is that @blakeyrat is actually the person I agree most with in this topic, in spite of refraining from blakeybashing being against the unwritten codex of this forum.


  • Banned

    @PWolff said:

    in spite of refraining from blakeybashing being against the unwritten codex of this forum

    Blakeybashing isn't mandatory - it's just that 99% of times, 99% of people find blakey wrong.



  • He's a dessert wax and a floor topping!


  • Discourse touched me in a no-no place

    @Gaska said:

    It's stupid and no one does this, but it's still in line with what @Jaime's said.

    There are pure binary protocols too, but they tend to be a lot harder to debug.


  • Java Dev

    I hate bugs in the SSL code.



  • ok I think ok gonna change my mind here. It's not treating a character as a number that's the biggest wtf here. It's the whole idea of character types. Starting with character literals: is ö one character or two? What if you upcase it? Which character is it (as I understand there are several ways to render some characters in Unicode)? Having a special syntax just for specifying character literals was only ever necessary in the old c-style characters-as-numbers paradigm anyway. Much better to make the easier-to-type single quotes available for string literals, and allow people to access codepoints through a property of that string.


  • FoxDev

    @Buddy said:

    Starting with character literals: is ö one character or two?

    Yes


  • 🚽 Regular

    @asdf said:

    @another_sam said:
    > s.encode(3) does not select the encode method from the string object, but it is simply syntactic sugar for the function call encode(s,3) where s becomes the first argument.

    GTFO.

    Uniform call syntax. It's going to make future C++ even more unreadable.


    Seen it before. Namely, in Lua. Doesn't bother me.

    Come to think of it, Python sort of has this with its need to have the self parameter declared in methods.


  • Banned

    @Zecc said:

    Seen it before. Namely, in Lua. Doesn't bother me.

    Maybe because Lua does it with syntax distinct from regular field access?


  • 🚽 Regular

    IIRC syntax for operator overloading in C++ is also similar.
    Not that that's a good point in favour, mind you.



  • I agree with you.

    Everything about a character should be implementation details.
    If you want character math, then make a class that does it for you.

    We are way past ASCII, and even then we had glorious EBCDIC that fucked up character math too.



  • Didn't the UTF-8 bit patterns that couldn't be represented with a single 16-bit UTF-16 sequence get deprecated?

    Edit: I was slightly off. It's the UTF-8 bit patterns that can't be represented by UTF-16 that are deprecated. For example, U+10FFFF is allowed in UTF-8, but U+110000 is disallowed because some morons chose UTF-16 to represent things. Which means that Unicode can't ever have more than 1114111 characters. Unicode 8 has 183983, which means we're about a sixth of the way to running out. We'll probably run out well before they add anything as useful as :arrows:.



  • @blakeyrat said:

    SOMEONE EXPLAIN TO ME WHAT GOOD IS WORKING WITH ALPHABETIC DISTANCES IN THE FIRST PLACE!

    A quick google for "alphabetic distance" turns up a few references to one paper claiming that they're relevant to humans sorting alphabetically. Nothing relevant to computers manipulating chars or strings. The only thing I can think of is simple substitution ciphers like rot13; whether this is useful is debatable.



  • @loose said:

    It cannot be a nice experience to always be moaned at, but sometimes.....

    It depends on who's doing the moaning. :giggity:



  • I am just about at the point where I want to join @blakeyrat and mute this topic. You need to answer this simple question:

    How

    Does

    Any

    Computer

    Language

    Sort

    Strings

    ?

    There are other questions, but they all "normalise" to this one, because to sort an array of strings requires each string element (in that array) to be reduced to an array of characters.



    1. did you know that your points are sometimes a bit hard to follow?
    2. in the UK you have to pay to use shopping carts?!!!


  • @Gribnit said:

    If I should like to do some kinds of searching or sorting, it is useful to know how far off target I think I may be, in which case I would want actual distance.

    True, but you need the distance in some specified or assumed collating sequence. This may happen to match the Unicode code point distance for some characters in some collating sequences, but this is certainly not true in general. A function that takes two chars and a collating sequence and returns a distance makesmay make sense, as does one that takes a character, a distance and a collating sequence and returns a character. Simply converting characters to integers, implicitly or explicitly, does not.



  • No, you place a refundable deposit on them, while you use them. And to be fair, not always. To illustrate, with a

    @Buddy said:

    hard to follow?

    reason, some shops physically modify "their" trolleys so that they cannot be removed from the store.



  • @flabdablet said:

    wouldn't it be nice if the result of multiplying an int32 by an int32 was automatically an int64

    The CPU that is executing your code is very likely is generating a 64-bit result. It's just a matter of the high-level language making use of the extended result.



  • @PWolff said:

    the product of an int_n and an int_n is int_n, too.

    As I said above, the CPU is probably generating an int_2n result; it's just a question of whether the language allows you to use the upper half of the result.



  • @Buddy said:

    What if you upcase it?

    Don't know what happens these days, but in the old days if bit 6 was set, then it was lower case (in the case of [a-z,A-Z]) and there was a definite bit mask relationship between the "shifted" and "un-shifted" keyboard key and the assigned numerical value i.e '1' and '!', '{' and '['.

    A lot of this was due to raisons that most computer stuff was hardware. The numerical value of a keyboard key was determined by which X and Y circuits were short circuited. This was, in turn, determined by the physical position of the key within the keyboard, and the problematical issue of constructing a circuit board to replicate the 'X' and 'Y' grid without crossing any lines. The shift effect would be noticed by getting two 'X's for your 'Y' (in essence, for all you pedants out there)

    Added Note: Heck, if you got your circuit board design right, you did not need any "logic", you could feed your key code directly onto the data bus (don't even start pedants, or I'll write a fucking essay about debouncing keys, latching flip flops, gating and all that other fucking hardware shit that are absolute prerequisites for computers to exist).


  • Winner of the 2016 Presidential Election

    @Zecc said:

    Seen it before. Namely, in Lua. Doesn't bother me.

    In general, the idea doesn't bother me either. The problem is that it's one more C++ feature that will make it really hard for people to figure out what a line of code actually does. As if that wouldn't have been hard enough already, with operator overloading (including the most basic ones), copy&move constructors, const overloading, macros, templates etc.


  • Discourse touched me in a no-no place

    @loose said:

    reduced to an array of characters

    Strictly, to a sequence of characters. You hardly ever need the fact that you can go to an array of characters. The array-ness is more useful as it is an efficient representation. (I've seen other representations for strings; they tend to suck compared to using arrays.)



  • @loose said:

    and there was a definite bit mask relationship between the "shifted" and "un-shifted" keyboard key and the assigned numerical value i.e '1' and '!', '{' and '['.

    And since Times of Yore I wonder why they didn't do the logical step to apply this to non-Latin characters as well, especially umlauts (as is most interesting for me as a German). Neither in MS-DOS codepages 437 / 850, nor on Macintosh (the genuine MacOS, not that BSE shitD stuff). It wouldn't have cost more than a few seconds of sensible thought, and wouldn't have blocked any additional code point.

    I think they did it in some ANSI codepage that Windows 95 adapted, and I thought why TF didn't Apple do that long ago?


  • Discourse touched me in a no-no place

    @PWolff said:

    And since Times of Yore I wonder why they didn't do the logical step to apply this to non-Latin characters as well, especially umlauts (as is most interesting for me as a German).

    It wouldn't work for French though, which needs é, è, ê and ë. It also runs into problems with Scandinavian languages, for example Swedish which needs ä and å (and indeed regards them as being completely separate letters from a). Once you don't have your nice rule universally, you're effectively stuck with the current situation.

    Having a German-only encoding is a different sort of problem. (Having dealt with the mess that the various Russian encodings used to be in, I'm very happy we've gone to a Unicode-based world!)



  • I didn't mean to insert umlauts between their base and the following letter, but arranging them in a way that lowercase and uppercase differ by 32 codepoints, as in bit 5 flipped.

    These are the characters at codepoints 0xA0 (160) to 0xFF (255) in Windows-1252 encoding:

     ¡¢£¤¥¦§¨©ª«¬­®¯
    °±²³´µ¶·¸¹º»¼½¾¿
    ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
    ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
    àáâãäåæçèéêëìíîï
    ðñòóôõö÷øùúûüýþÿ
    

    The correspondence between uppercase and lowercase is evident. (Except for ß and ÿ). Compare this to MS-DOS and old Mac encodings.

    Why didn't they do that right away in the 1980s?


  • FoxDev

    The concept of thinking things through properly was invented in 1990?



  • Or the concept of thinking things through enterprisey™ was invented 1985?



  • @loose said:

    reason, some shops physically modify "their" trolleys so that they cannot be removed from the store.

    That's discrimination against poor people.



  • That is an implementation detail.

    There is no question you can come up with that requires allowing the programmer access to the internal storage mechanism of a character.



  • @xaade said:

    That's discrimination against poor people.

    No, not really. The apparent discrimination maybe an unforeseen consequence of the policy to not allow their trolleys to be removed from the store. It is almost certainly discrimination against unfit or "in-complete" people as everybody has to carry their purchases to their car / bus / home. I don't think the store minds people using their trolleys to do that, it's just that they mind people not bringing the trolley back afterwards. Incidentally, you still have to put your pound in to unchain the trolley.

    @xaade said:

    That is an implementation detail.

    There is no question you can come up with that requires allowing the programmer access to the internal storage mechanism of a character.

    Maybe it's me or the (relative) time of day but you are going to have to explain the joke here as I don't have a clue. I even checked the post that discourse thinks you are replying to in order to remind myself what I was thinking at the time.



  • @blakeyrat said:

    But Koka isn't asking for a localization (or "collation" if we're going to use that term) when it does the subtraction, so, again, you're agreeing with me that Koka is wrong here.
    So... why are you posting as if I'm an idiot when you fundamentally agree with me entirely?

    I actually wasn't taking a position on Koka, having not looked into it deeply enough to see if it was asking for a collation. (It wouldn't have to in order for things to work, if it said "I'm always going to use my favourite collation, you don't get a vote." This just wouldn't be as useful.)
    I was simply pointing out that the sample was subtracting characters, which can be a reasonable thing to do, and not adding them as you forcefully stated.

    @blakeyrat said:

    The point is, Unicode contains hundreds of thousands of characters, of which 99.5%, when sorted in US English, have no meaningful sort order whatsoever.

    This is certainly true, and if you're working with this sort of distance notion, you have to take this into account or you'll get GIGO.

    @blakeyrat said:

    @flabdablet said:
    The notion of alphabetic distance has enough in common with numeric subtraction to make the use of the - operator a natural way to express it: 'M' - 'E' == 8.

    Right; but since 'M' - 'e' != 8, what good is that? It can't be used for sorting.

    If you were doing it properly (and I'm quite happy to concede that Koka isn't, and that in general it would be difficult to do), 'M' - 'e' should also be 8. If the collation considers 'E' and 'e' to be equivalent, then 'M' - 'E' and 'M' - 'e' should be the same.

    @blakeyrat said:

    I concede that that is an operation that can be performed; I still don't know what good it is.

    I gave an example in my earlier post: implementing rot13, or a simple Caesar-type cipher for use in puzzles etc.

    @blakeyrat said:

    the distance between two characters isn't an int, it's a brand new type.

    Arguable. If you have something that can be sorted, you have something that can be implicitly mapped to and from some set of integers (that's what a sort is, mathematically), so you can use a plain old integer with the understanding that it's really operating on the corresponding integers rather than the base type. But in a strongly typed language, yes, an explicit type would be preferable.

    For the record: I do agree with your overall point that this is a bad feature. I'm just trying to make the point that it is not necessarily an intrinsically wrong thing to do; it can be made meaningful, just with extreme difficulty if you want to be locale-sensitive, and limited use even if you don't (and I'm certain Koka hasn't bothered). That's why it's a bad feature :)

    @PWolff said:

    These are the characters at codepoints 0xA0 (160) to 0xFF (255) in Windows-1252 encoding:

    TIL that ß is the uppercase form of ÿ. :trollface:



  • Who's that gonna stop though? If someone wants to steal a cart, is one dollar really enough to stop them? And a shopping cart that I can't take out to the car with me is basically worthless. I'm super glad we don't have any such tight-assed supermarkets where I'm at.



  • This is a post holder. It means I have an interesting (my opinion, mind)response buy would prefer to use a proper UI (not my mobile) and that will be in a few hours :)

    :hanzo: by @xaade



  • It would stop someone who would steal a cart but doesn't have a dollar.

    A lot of these carts get left out in the lot, at which point a person not inclined to even visit can take a cart for nothing.

    The money is really just to ensure they're returned to the inside of the store.

    But this just all shows that the store doesn't want to pay someone JUST to push carts.

    Stores that pay cartpushers, don't seem to have this problem.


  • Discourse touched me in a no-no place

    @Buddy said:

    And a shopping cart that I can't take out to the car with me is basically worthless.

    Why wouldn't they at least allow the cart to go as far as that? Yes, I could see trying to prevent the cart being taken off the overall site, but that would include the parking lot. Sounds like that particular store is run by a cheap-ass idiot of a manager.



  • It's not common, IME, but I've seen carts with metal poles that are to tall to go through the doors, mostly in cheap stores that cater to lower socioeconomic customers. More common, at least in recent years, are carts whose wheels lock if removed from the parking lot. (These aren't limited to cheap stores.)



  • @xaade said:

    Stores that pay cartpushers, don't seem to have this problem.

    I know a big shop with a big parking lot with cart sheds all over it. You have to put in an Euro (or a 2 Euro coin or a 50 cent coin) to get it loose, and most people take their cart near the entrance but put it back near their car, so they have a cart pusher that puts the carts back to the entrance. (IMHO he looks similar to Mario from the Nintendo games.)

    (Nevertheless, you can't just steal a cart without investing a coin.)



  • @Maciejasjmj said:

    Hm.

    A novel part about Koka is that it automatically infers all the side effects that occur in a function. The absence of any effect is denoted as total (or <>) and corresponds to pure mathematical functions. If a function can raise an exception the effect is exn, and if a function may not terminate the effect is div (for divergence).

    Someone call that Turing guy...

    That's why they have the word "may". If the compiler can't prove that it always terminates, then it may terminate. findCollatzConjectureCounterexample may terminate.


  • Discourse touched me in a no-no place

    @immibis_ said:

    If the compiler can't prove that it always terminates, then it may terminate. findCollatzConjectureCounterexample may terminate.

    Compilers typically operate with three modes in this regard:

    1. Definitely terminates (or rather terminates by returning). Provable for trivial cases, such as where there are no loops and no calls to functions in the other two categories.
    2. Never terminates. Provable for cases where we hit abort() or enter a trivial infinite loop, or where we definitely call a non-terminating function.
    3. Might terminate. Everything else. The logical equivalent of “i dunno, lol”.

    You can strive to keep things out of the third category, but you can't eliminate it and keep things decidable.


Log in to reply