More WTFery from SorryIAsked.com (i.e. StackOverflow)



  • LINK


    The larger WTF, of course, is Unicode, a representation scheme that has about as much to do with reality as Ronnie Crane had to do with the space program (and considetably less charm).



  • What the hell is wrong with Unicode? I'll admit it has some stupidity: the pile of poo and snowman come to mind, as well as the "decomposed" representations of glyphs which have to be normalized, but all-in-all it's not awful.



  • @bridget99 said:

    LINK


    The larger WTF, of course, is Unicode, a representation scheme that has about as much to do with reality as Ronnie Crane had to do with the space program (and considetably less charm).

    From a market share point of view, most writing systems in the world have characters that do not fit in 7 bit ASCII code. And to make things funnier, some languages even have more than one writing system (such as Japanese). You may not be exposed to those "foreign" writing systems yet but the problem is not Unicode...



  • Hey it's Bridget99! Worst Programmer in the World!

    The only issue with that question is that C and C++ are shitty languages that have never had unicode added to their libraries. That WTF has nothing to do with unicode, and everything to do with the C standardization committees who have been fucking around with C++0x instead of getting work done.



  • @blakeyrat said:

    Hey it's Bridget99! Worst Programmer in the World!

    For those keeping score:

    • Unicode: BAD
    • Message pumping: BAD
    • Maintainability: BAD
    • OOP: BAD
    • FOSS: GREAT


  • @morbiuswilters said:

    For those keeping score:

    • Unicode: BAD
    • Message pumping: BAD
    • Maintainability: BAD
    • OOP: BAD
    • FOSS: GREAT

    The developer equivalent of "git 'r done".



  • @morbiuswilters said:

    What the hell is wrong with Unicode? I'll admit it has some stupidity: the pile of poo and snowman come to mind, as well as the "decomposed" representations of glyphs which have to be normalized, but all-in-all it's not awful.

    It's the best thing since ASCII. Yeah, there are different representations, so what? It's a lot easier than asking the user "could you please tell me the encoding of these two strings, prettyplease?". I think I'll never forget CP1252.



  • IIRC the problems Unicode had were caused by historical and political issues, not its (current) scheme or implementation.

    Here is an example of what i mean: http://blogs.msdn.com/b/oldnewthing/archive/2012/05/04/10300670.aspx



  • I'd argue not deciding upon a single data format for a character to be a pretty big implementation detail.



  •  oops that wasn't the link i thought it was.

     



  • @TGV said:

    Yeah, there are different representations, so what?

    It makes implementations needlessly complex? There's no good reason for it? I'm not saying it makes Unicode unusable, but it is stupid.

    @TGV said:

    It's a lot easier than asking the user "could you please tell me the encoding of these two strings, prettyplease?"

    Unicode doesn't completely fix that. You still have to know which character encoding Unicode is using. The biggest benefits are that it eliminates the need for entirely separate encodings (allowing characters from different languages to exist within the same document); and its ubiquity.



  • @morbiuswilters said:

    @TGV said:
    Yeah, there are different representations, so what?

    It makes implementations needlessly complex? There's no good reason for it? I'm not saying it makes Unicode unusable, but it is stupid.

    Despite Unicode being new, it couldn't really break legacy stuff, so it has it's own historical reasons (and hysterical raisins) for various quirks. Usually it caused by the requirement, that any string in any of the legacy encodings has to have

    @morbiuswilters said:

    @TGV said:
    It's a lot easier than asking the user "could you please tell me the encoding of these two strings, prettyplease?"

    Unicode doesn't completely fix that. You still have to know which character encoding Unicode is using. The biggest benefits are that it eliminates the need for entirely separate encodings (allowing characters from different languages to exist within the same document); and its ubiquity.

    Yeah, the UTF-16 was a mistake from time when they thought they can get away with 16 bits per codepoint. And they can't get rid of it, because some (Microsoft most notably) jumped up and started introducing 16-bit chars all over the place. And than it turned out that it will have to be a variable-length encoding anyway and caused this ugly hole in the codepoints, but they couldn't drop it, because it was already used all over the place and Windows is stuck with 16-bit wchar_t, so supporting Unicode beyond 2.0 pain.



  • @Bulb said:

    Despite Unicode being new, it couldn't really break legacy stuff, so it has it's own historical reasons (and hysterical raisins) for various quirks. Usually it caused by the requirement, that any string in any of the legacy encodings has to have

    I don't see why they couldn't have specified that everything is just always precomposed, instead of allowing for decomposed characters that then have to be normalized.

    @Bulb said:

    Yeah, the UTF-16 was a mistake from time when they thought they can get away with 16 bits per codepoint. And they can't get rid of it, because some (Microsoft most notably) jumped up and started introducing 16-bit chars all over the place. And than it turned out that it will have to be a variable-length encoding anyway and caused this ugly hole in the codepoints, but they couldn't drop it, because it was already used all over the place and Windows is stuck with 16-bit wchar_t, so supporting Unicode beyond 2.0 pain.

    I understand why that happened and I don't think it's awful that UTF-16 is still around (although I would prefer to see it deprecated eventually). The thing is, people seem to think Unicode is this clean, modern system, when in reality it has a lot of cruft. Of course it's better than having hundreds of code pages that extend ASCII, but still..



  • ‮nuıɔopǝ ıs ɟnll oɟ ɟnu sʇnɟɟ

    čč



  • @morbiuswilters said:

    I don't see why they couldn't have specified that everything is just always precomposed, instead of allowing for decomposed characters that then have to be normalized.

    Combinatorial explosion? The Unicode Consortium would have preferred everything to be [i]decomposed[/i] (and thus avoid needing to allocate codepoints for, say, the entire Vietnamese alphabet), but given the number of mappings already out there with separate codepoints for things like å and έ they didn't really have much of a hope of that if they wanted any adoption.



  • @blakeyrat said:

    The only issue with that question is that C and C++ are shitty languages that have never had unicode added to their libraries. That WTF has nothing to do with unicode, and everything to do with the C standardization committees who have been fucking around with C++0x instead of getting work done.

    C & C++ [i]sort of[/i] support Unicode, and by "sort of" I mean they don't really:

    ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension. The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers.
    It's like they thought Unicode support might be a good idea but then realised doing it right would be hard, so they made wchar_t completely useless then sat back and ate donuts.


  • @Vanders said:

    It's like they thought Unicode support might be a good idea but then realised doing it right would be hard, so they made wchar_t completely useless then sat back and ate donuts.
    That's a good way to describe the entire string handling feature set in C.



  • @Bulb said:

    And they can't get rid of it, because some (Microsoft most notably) jumped up and started introducing 16-bit chars all over the place.

    Before you get too foaming-at-the-mouth over Microsoft, the reason they joined the 16-bit Unicode bandwagon is because at the time they did it UTF8 did not yet exist. In other words, they're early adopters who got fucked by rapidly-changing standards. (Hmmm... reminds me of CSS support in IE4-7...)

    In fact I wager the invention of UTF8 was at least somewhat inspired by Microsoft's experience putting 16-bit into actual day-to-day use.



  • @blakeyrat said:

    @Bulb said:
    And they can't get rid of it, because some (Microsoft most notably) jumped up and started introducing 16-bit chars all over the place.

    Before you get too foaming-at-the-mouth over Microsoft, the reason they joined the 16-bit Unicode bandwagon is because at the time they did it UTF8 did not yet exist. In other words, they're early adopters who got fucked by rapidly-changing standards. (Hmmm... reminds me of CSS support in IE4-7...)

    In fact I wager the invention of UTF8 was at least somewhat inspired by Microsoft's experience putting 16-bit into actual day-to-day use.

    To talk about the "invention of UTF-8" is crap . UTF-8 isn't so much an encoding as it is a behaviour pattern. It consists of the Unicode dorks of the world deciding that they needed to surrender to reality and find a way to do their dorky $%^! without screwing over the rest of us (e.g. with a bunch of wasteful, rage-inducing zero bytes). Why didn't they just do things this way from the beginning? Because they planned to just shove their 16-bit Unicode contraption down our collective throats, together with a healthy dose of self-importance.



    The truly ironic thing is that 16 bits doesn't actually end up being enough space for what they wanted to do! These people were going to need something like UTF-8 anyway, and they should have known it. What was ever the point of trying to shove 16-bit characters down our throats? There was none. It was just passive-aggressive B.S. from a bunch of very sick people who ought to be no better than server administrators... if that.



    I suppose that what the OP on that StackOverflow post should have asked was this:



    "How should I do a case-insensitive string comparison in C++ assuming the strings use only ASCII characters 0-127?"



    That's a legitimate question, and it's a shame that the social failure that happens whenever more than one computer dork gathers in His name makes it so damned difficult to get a straight answer.



  • @bridget99 said:

    @blakeyrat said:
    @Bulb said:
    And they can't get rid of it, because some (Microsoft most notably) jumped up and started introducing 16-bit chars all over the place.

    Before you get too foaming-at-the-mouth over Microsoft, the reason they joined the 16-bit Unicode bandwagon is because at the time they did it UTF8 did not yet exist. In other words, they're early adopters who got fucked by rapidly-changing standards. (Hmmm... reminds me of CSS support in IE4-7...)

    In fact I wager the invention of UTF8 was at least somewhat inspired by Microsoft's experience putting 16-bit into actual day-to-day use.

    To talk about the "invention of UTF-8" is crap . UTF-8 isn't so much an encoding as it is a behaviour pattern. It consists of the Unicode dorks of the world deciding that they needed to surrender to reality and find a way to do their dorky $%^! without screwing over the rest of us (e.g. with a bunch of wasteful, rage-inducing zero bytes). Why didn't they just do things this way from the beginning? Because they planned to just shove their 16-bit Unicode contraption down our collective throats, together with a healthy dose of self-importance.



    The truly ironic thing is that 16 bits doesn't actually end up being enough space for what they wanted to do! These people were going to need something like UTF-8 anyway, and they should have known it. What was ever the point of trying to shove 16-bit characters down our throats? There was none. It was just passive-aggressive B.S. from a bunch of very sick people who ought to be no better than server administrators... if that.



    I suppose that what the OP on that StackOverflow post should have asked was this:



    "How should I do a case-insensitive string comparison in C++ assuming the strings use only ASCII characters 0-127?"



    That's a legitimate question, and it's a shame that the social failure that happens whenever more than one computer dork gathers in His name makes it so damned difficult to get a straight answer.

    There is NEVER any reason to use a fixed-width encoding for strings, period. Variable-length encoding is ALWAYS the way to go, without exception



  • @bridget99 said:

    To talk about the "invention of UTF-8" is crap . UTF-8 isn't so much an encoding as it is a behaviour pattern. It consists of the Unicode dorks of the world deciding that they needed to surrender to reality and find a way to do their dorky $%^! without screwing over the rest of us (e.g. with a bunch of wasteful, rage-inducing zero bytes).

    Oh great, a history lesson from The Expert, Bridget99. I've been looking forward to-- oh wait you're a dumbshit and you don't know anything at all.



  • @Watson said:

    @morbiuswilters said:

    I don't see why they couldn't have specified that everything is just always precomposed, instead of allowing for decomposed characters that then have to be normalized.

    Combinatorial explosion? The Unicode Consortium would have preferred everything to be decomposed (and thus avoid needing to allocate codepoints for, say, the entire Vietnamese alphabet), but given the number of mappings already out there with separate codepoints for things like å and έ they didn't really have much of a hope of that if they wanted any adoption.

    But then they went for precomposed as the canonical representation, so the problem is even worse. I agree that decomposed makes more sense in the grand scheme of things, but that boat sailed long ago.

    Also, I disagree re: adoption. I don't think it would have retarded adoption to require charsets with precomposed characters to transform into decomposed forms. They just would have had to map from one codepoint to one-or-more codepoints; it's not rocket science.



  • @pkmnfrk said:

    There is NEVER any reason to use a fixed-width encoding for strings, period. Variable-length encoding is ALWAYS the way to go, without exception

    Lots of internal representations use UTF-32; it makes memory allocation and rendering easier. It also comes up in on-disk database formats, where you want each row to be a fixed width. So, there are cases.

    And to play devil's advocate: UTF-8 may be more efficient for Westerners (1 byte per char vs. 2) but it's less efficient for most Asians (3 bytes per char vs. 2).



  • @blakeyrat said:

    @bridget99 said:
    To talk about the "invention of UTF-8" is crap . UTF-8 isn't so much an encoding as it is a behaviour pattern. It consists of the Unicode dorks of the world deciding that they needed to surrender to reality and find a way to do their dorky $%^! without screwing over the rest of us (e.g. with a bunch of wasteful, rage-inducing zero bytes).

    Oh great, a history lesson from The Expert, Bridget99. I've been looking forward to-- oh wait you're a dumbshit and you don't know anything at all.

    No shit. The time between publishing of Unicode and the release of UTF-8 was 13 months. And they knew all along they would need an ASCII-compatible, variable-length encoding. UTF-8 was modeled on ISO 10646's UTF-1, which predates Unicode. So, no, the Unicode folks did not have to be dragooned by reality into accepting a non-ideological solution.



  • @morbiuswilters said:

    But then they went for precomposed as the canonical representation, so the problem is even worse. I agree that decomposed makes more sense in the grand scheme of things, but that boat sailed long ago.
    Well, they gave [i]four[/i] normal forms, of which "canonical decomposition followed by canonical composition" is one - the one W3C adopted for everything XML, natch - along with a slew of special cases for things like two distinct characters having the same canonical decomposition. Keeping everything [i]decomposed[/i] is one of the others.

    @morbiuswilters said:
    Also, I disagree re: adoption. I don't think it would have retarded adoption to require charsets with precomposed characters to transform into decomposed forms. They just would have had to map from one codepoint to one-or-more codepoints; it's not rocket science.
    Oh; I agree with you: anyone already having to deal with multiple scripts would have been doing such mappings already, only on a case-by-case ad hoc way. It's not like it would have broken ASCII; I suspect doing [i]that[/i] would have been unpopular.

     



  • @morbiuswilters said:

    @pkmnfrk said:
    There is NEVER any reason to use a fixed-width encoding for strings, period. Variable-length encoding is ALWAYS the way to go, without exception

    Lots of internal representations use UTF-32; it makes memory allocation and rendering easier. It also comes up in on-disk database formats, where you want each row to be a fixed width. So, there are cases.

    And to play devil's advocate: UTF-8 may be more efficient for Westerners (1 byte per char vs. 2) but it's less efficient for most Asians (3 bytes per char vs. 2).

    Read the tags.



  • @morbiuswilters said:

    @blakeyrat said:
    Hey it's Bridget99! Worst Programmer in the World!

    For those keeping score:

    • Unicode: BAD
    • Message pumping: BAD
    • Maintainability: BAD
    • OOP: BAD
    • FOSS: GREAT

    IE: GREAT



  • @blakeyrat said:

    @bridget99 said:
    To talk about the "invention of UTF-8" is crap . UTF-8 isn't so much an encoding as it is a behaviour pattern. It consists of the Unicode dorks of the world deciding that they needed to surrender to reality and find a way to do their dorky $%^! without screwing over the rest of us (e.g. with a bunch of wasteful, rage-inducing zero bytes).

    Oh great, a history lesson from The Expert, Bridget99. I've been looking forward to-- oh wait you're a dumbshit and you don't know anything at all.

    I'm not sure what part of my "history lesson" you take issue with. As usual, I got only ad hominem attacks in response to what I posted here, which really says something about the quality of this site and the people who use it regularly. Maybe you'd prefer a history lesson from the good folks at utf8everywhere.org:



    "In 1988, Joseph D. Becker published the first Unicode draft proposal. At the basis of his design was the naïve assumption that 16 bits per character would suffice. In 1991, the first version of the Unicode standard was published, with code points limited to 16 bits. In the following years many systems added support for Unicode and switched to the UCS-2 encoding. It was especially attractive for new technologies, like Qt framework (1992), Windows NT 3.1 (1993) and Java (1995)."



    "However, it was soon discovered that 16 bits per character will not do for Unicode. In 1996, the UTF-16 encoding was created so existing systems would be able to work with non-16-bit characters. This effectively nullified the rationale behind choosing 16-bit encoding in the first place, namely being a fixed-width encoding..."



    I had never been to this site before today... but it's eerie how closely their history matches mine.



    Further down, we get this:



    "Q: Isn’t UTF-8 merely an attempt to be compatible with ASCII? Why keep this old fossil?



    A: Maybe it was. Today, it is a better and more popular encoding of Unicode than any other."



    The quoted sections above basically just reiterate what I tried to tell you: Unicode was naive in its construction and heavy-handed in its implementation. UTF-8, regardless of timeline, came to popularity as a compromise on the part of Unicode proponents who were forced to deal with users of ASCII.



    As for my larger point, i.e. that StackOverflow.com is a crappy website for getting real questions answered, I think the link I originally posted speaks for itself. Case-insensitive string comparison should be a straightforward thing. Don't muddy the waters with other alphabets and such. Doing so doesn't change the fact that people need to do case-insensitive string comparisons. They don't have to be raging Anglophiles or Bible-thumpers from Alabama to have this need. Pretty much every programmer in every country will need to do a case-insensitive string comparison of the sort I'm suggesting fairly often. Any encoding scheme that moves us away from this goal is crap. Any website that can't respond intelligently to questions about such a comparison is crap.



  • @bridget99 said:

    @blakeyrat said:
    Oh great, a history lesson from The Expert, Bridget99. I've been looking forward to-- oh wait you're a dumbshit and you don't know anything at all.

    I'm not sure what part of my "history lesson" you take issue with. As usual, I got only ad hominem attacks in response to what I posted here, which really says something about the quality of this site and the people who use it regularly. Maybe you'd prefer a history lesson from the good folks at utf8everywhere.org:



    "In 1988, Joseph D. Becker published the first Unicode draft proposal. At the basis of his design was the naïve assumption that 16 bits per character would suffice. In 1991, the first version of the Unicode standard was published, with code points limited to 16 bits. In the following years many systems added support for Unicode and switched to the UCS-2 encoding. It was especially attractive for new technologies, like Qt framework (1992), Windows NT 3.1 (1993) and Java (1995)."



    "However, it was soon discovered that 16 bits per character will not do for Unicode. In 1996, the UTF-16 encoding was created so existing systems would be able to work with non-16-bit characters. This effectively nullified the rationale behind choosing 16-bit encoding in the first place, namely being a fixed-width encoding..."



    I had never been to this site before today... but it's eerie how closely their history matches mine.



    Further down, we get this:



    "Q: Isn’t UTF-8 merely an attempt to be compatible with ASCII? Why keep this old fossil?



    A: Maybe it was. Today, it is a better and more popular encoding of Unicode than any other."



    The quoted sections above basically just reiterate what I tried to tell you: Unicode was naive in its construction and heavy-handed in its implementation. UTF-8, regardless of timeline, came to popularity as a compromise on the part of Unicode proponents who were forced to deal with users of ASCII.

    That site doesn't agree with you at all, dipshit. The highlighted parts don't even have anything to do with what you said. Yes, limiting Unicode to 16-bits was short-sighted, but your complaint HAD NOTHING TO DO WITH THAT. You complained that UTF-16 had too many zero bytes, not that Unicode didn't support enough characters.

    You have shown no proof that UTF-8 was a compromise that was forced on the "Unicode geeks". It was planned from the beginning. It was long-considered a necessity simply because there were so many legacy applications that could not deal with null bytes.

    @bridget99 said:

    As for my larger point, i.e. that StackOverflow.com is a crappy website for getting real questions answered, I think the link I originally posted speaks for itself. Case-insensitive string comparison should be a straightforward thing. Don't muddy the waters with other alphabets and such. Doing so doesn't change the fact that people need to do case-insensitive string comparisons. They don't have to be raging Anglophiles or Bible-thumpers from Alabama to have this need. Pretty much every programmer in every country will need to do a case-insensitive string comparison of the sort I'm suggesting fairly often. Any encoding scheme that moves us away from this goal is crap. Any website that can't respond intelligently to questions about such a comparison is crap.

    I agree that SO can be quite stupid (it's like a slightly stupider version of TDWTF). However, your point doesn't even stand: if you need to do ASCII-only case insensitive comparisons, you can, even with Unicode. If you need to do multi-language-aware case insensitive comparisons, then Unicode is obviously your best bet and there are libraries to do that in C++ (some even mentioned on the same fucking SO page you linked, you fucking retard.)



  • @bridget99 said:

    StackOverflow.com is a crappy website for getting real questions answered

    I love Yahoo Answers, where a lot of people will happily reply with "I don't know".



  • @bridget99 said:

    As for my larger point, i.e. that StackOverflow.com is a crappy website for getting real questions answered, I think the link I originally posted speaks for itself. Case-insensitive string comparison should be a straightforward thing. Don't muddy the waters with other alphabets and such. Doing so doesn't change the fact that people need to do case-insensitive string comparisons.

    The poster of that question asked for a case-insensitive compare, and whether or not it is unicode-friendly. It's not that big of a leap to assume "oh, he's looking for a case-insensitive unicode compare method".
    If he was only looking for a case-insensitive compare for the English alphabet, he should have said so.


  • BINNED

    @bridget99 said:

    As usual, I got only ad hominem attacks in response to what I posted here, which really says something about the quality of this site and the people who use it regularly.
    There's your problem. You expect quality on a site dedicated to WTFs.



  • @morbiuswilters said:

    it's like a slightly stupider version of TDWTF

    And less funny, but only by a tiny bit



  • @Salamander said:

    @bridget99 said:

    As for my larger point, i.e. that StackOverflow.com is a crappy website for getting real questions answered, I think the link I originally posted speaks for itself. Case-insensitive string comparison should be a straightforward thing. Don't muddy the waters with other alphabets and such. Doing so doesn't change the fact that people need to do case-insensitive string comparisons.

    The poster of that question asked for a case-insensitive compare, and whether or not it is unicode-friendly. It's not that big of a leap to assume "oh, he's looking for a case-insensitive unicode compare method".
    If he was only looking for a case-insensitive compare for the English alphabet, he should have said so.



    Or, the smart asses that answered, who clearly know that there is no reasonable "case-insensitive unicode compare method," could have just assumed he wanted "a case-insensitive compare for the English alphabet." I don't think it's unreasonable to assume this, especially since the question was clearly asked by a native English speaker, most of the Web and most UIs are in English, case-insensitive comparisons in languages that use the other parts of Unicode are a complete bastard, and so on.

    StackOverflow.com is just full of pedantic dickweeds.


  • @bridget99 said:

    Or, the smart asses that answered, who clearly know that there is no reasonable "case-insensitive unicode compare method," could have just assumed he wanted "a case-insensitive compare for the English alphabet." I don't think it's unreasonable to assume this, especially since the question was clearly asked by a native English speaker, most of the Web and most UIs are in English, case-insensitive comparisons in languages that use the other parts of Unicode are a complete bastard, and so on.

    That is a very unreasonable assumption. If I go into Barnes and Noble and ask for a Spanish dictionary and the guy hands me an English dictionary because "You speak English, as do most of our customers" somebody is getting cockpunched.

    @bridget99 said:

    StackOverflow.com is just full of pedantic dickweeds.

    I agree; it's just like every place computer nerds gather. That doesn't change the fact that in this particular case, SO worked. And, of course, you're complaining because it worked (and then going off on an idiotic tangent about how Unicode is so broken, despite having nothing to support this claim.)


  • ♿ (Parody)

    @morbiuswilters said:

    @bridget99 said:
    Or, the smart asses that answered, who clearly know that there is no reasonable "case-insensitive unicode compare method," could have just assumed he wanted "a case-insensitive compare for the English alphabet." I don't think it's unreasonable to assume this, especially since the question was clearly asked by a native English speaker, most of the Web and most UIs are in English, case-insensitive comparisons in languages that use the other parts of Unicode are a complete bastard, and so on.

    That is a very unreasonable assumption. If I go into Barnes and Noble and ask for a Spanish dictionary and the guy hands me an English dictionary because "You speak English, as do most of our customers" somebody is getting cockpunched.

    Yeah, but your analogy doesn't apply to the linked question. The correct analogous question would have been, "I need to look up some English words in a dictionary. Can you show me such a book? Also, do you have any that also have Spanish words?" The guy was asking for a non-unicode friendly version, but was interested to see if one that was portable and unicode friendly existed. So there's room for all of that discussion. TRWTF is this thread.



  • It is naïve to think that all words in English language can be represented in ASCII.



  • @ender said:

    It is naïve to think that all words in English language can be represented in ASCII.

    Oh, please enlighten me then. Which words in my native language cannot be rendered in the unadorned Latin alphabet? And don't come back with "naïve." Nobody actually writes it that way, any more than they write "coöperate" or "reënact." We're not forming a hair band here.



  • @boomzilla said:

    @morbiuswilters said:
    @bridget99 said:
    Or, the smart asses that answered, who clearly know that there is no reasonable "case-insensitive unicode compare method," could have just assumed he wanted "a case-insensitive compare for the English alphabet." I don't think it's unreasonable to assume this, especially since the question was clearly asked by a native English speaker, most of the Web and most UIs are in English, case-insensitive comparisons in languages that use the other parts of Unicode are a complete bastard, and so on.

    That is a very unreasonable assumption. If I go into Barnes and Noble and ask for a Spanish dictionary and the guy hands me an English dictionary because "You speak English, as do most of our customers" somebody is getting cockpunched.

    Yeah, but your analogy doesn't apply to the linked question. The correct analogous question would have been, "I need to look up some English words in a dictionary. Can you show me such a book? Also, do you have any that also have Spanish words?" The guy was asking for a non-unicode friendly version, but was interested to see if one that was portable and unicode friendly existed. So there's room for all oAf that discussion. TRWTF is this thread.

    Actually, it would be analogous if someone walked into Barnes and Noble, asked for a dictionary, and instead of being directed to an English language one, they got a 45-minute lecture about linguistics. That is actually a very good analogy. It's OK to treat English as a default language for many purposes. No one - the Welsh included - wants the fire escapes in Wales to be labeled "Pyfydd n'Chwylydde."



  • @bridget99 said:

    Actually, it would be analogous if someone walked into Barnes and Noble, asked for a dictionary, and instead of being directed to an English language one, they got a 45-minute lecture about linguistics. That is actually a very good analogy.

    No it isn't, you worthless sack of retarded shit. The OP explicitly asked for information on Unicode. Shut the fuck up and go back to living in fear of Microsoft you paranoid lunatic.



  • @morbiuswilters said:

    @bridget99 said:
    Actually, it would be analogous if someone walked into Barnes and Noble, asked for a dictionary, and instead of being directed to an English language one, they got a 45-minute lecture about linguistics. That is actually a very good analogy.

    No it isn't, you worthless sack of retarded shit. The OP explicitly asked for information on Unicode. Shut the fuck up and go back to living in fear of Microsoft you paranoid lunatic.

    He also specifically asked for a case-insensitive string comparison, which tends to hint at what he really wanted. But I understand that dorks can't /won't read between the lines... that's also why you have such poor luck with women.



  • @bridget99 said:

    No one - the Welsh included - wants the fire escapes in Wales to be labeled "Pyfydd n'Chwylydde."

    I do. I think that would be fucking hilarious.



  • @bridget99 said:

    ..that's also why you have such poor luck with women.

    Wrong again, jerk! I have trouble with women because of my looks, odor, personality, empty bank account, odor, morbid obesity, acne, odor and below-average-sized, malformed, uncooperative genitals. So there!



  • @blakeyrat said:

    @bridget99 said:
    No one - the Welsh included - wants the fire escapes in Wales to be labeled "Pyfydd n'Chwylydde."

    I do. I think that would be fucking hilarious.

    I'm shocked to find that Wales has fire escapes. I figured we'd kept their existence a secret.



  • @morbiuswilters said:

    I figured we'd kept their existence a secret.

    Have you seen the shape of Wales? Keeping its existence a secret is rather like trying to conceal a large angry facial boil.

    At present Ireland seems to be distracting attention so perhaps it's working...



  • @morbiuswilters said:

    @blakeyrat said:
    @bridget99 said:
    No one - the Welsh included - wants the fire escapes in Wales to be labeled "Pyfydd n'Chwylydde."

    I do. I think that would be fucking hilarious.

    I'm shocked to find that Wales has fire escapes. I figured we'd kept their existence a secret.

    We were trying to, but in the end someone couldn't restrain their sniggers; hence the "Pyfydd".



  • UTF-8 was invented by Rob Pike and Ken Thompson. Probably since they are the only guys who weren't making lazy impls and eating donuts. Come to think about it, those probably form the majority of the select few programming guys who really know what they are talking about (this group includes DaveK and Faxmachinen of course :) ). In the mean time everyone else was busy trying to get the glory. Same story with many innovations in this field.



  • @Obfuscator said:

    UTF-8 was invented by Rob Pike and Ken Thompson.

    Yes, but they got the idea from UTF-1, which was an optional part of the draft ISO 10646 spec. UTF-1 was by no means ideal, but it did birth the idea of a byte-stream encoding that was backwards-compatible with ASCII. When UCS was essentially merged with Unicode, it became obvious that a better byte-stream encoding was needed. Plan 9 unveiled UTF-8 (which is a very elegant solution, BTW) and the rest is history. Trying to claim that the Unicode folks were out-of-touch with reality because they first focused on a fixed-width, 2-byte encoding (as bridget is trying to do) is pure bullshit.

    @Obfuscator said:

    the select few programming guys who really know what they are talking about (this group includes DaveK and Faxmachinen of course :) )

    I notice that you misspelled my name. Twice.



  • Cool, I didn't know that.

    @DaveK Faxmachinen said:

    I notice that you misspelled my name. Twice.

    That's the first thing you said that actually made me laugh out loud. Either you're getting more accomplished, or you have succeded in twisting my humour far enough, I'll let you be the judge :)

    Oh, and the wchar_t donut joke was the best all year. Two laughs in one thread, incredible start of the weekend!

    In hindsight I think I gave you waay to much bait now, but I'll live with that :)



  • The best part about Unicode character casing is that it's Locale dependant.

    For example in a Turkish locale, uppercase i is İ and lowercase I is ı.

     


Log in to reply