Case (in)?sensitive filesystems are :doing_it_wrong:



  • @ixvedeusi It's not a case of case sensitivity. Jesus.



  • @Rhywden said in WTF Bites:

    Nope. You were saying the exact opposite, namely that you were expecting them to be treated as the same character.

    My point is that it's ambiguous.



  • @ixvedeusi said in WTF Bites:

    @Rhywden said in WTF Bites:

    Nope. You were saying the exact opposite, namely that you were expecting them to be treated as the same character.

    My point is that it's ambiguous.

    It's ambiguous only due to the fact that you're mixing up character sets and cases. You're currently saying that "Character A from ASCII is not equivalent to character B from LATIN-1 thus there must be a problem with case-sensitivity!"


  • I survived the hour long Uno hand

    @Rhywden said in WTF Bites:

    Well, then show me the editor which does what he says

    It's possible -- you may want to sit down for this -- for all programs to be wrong.



  • @Yamikuronue Riiiiiight. The "I'm a genius!" argument.



  • @ixvedeusi said in WTF Bites:

    I have to admit I haven't been confronted with it for some time, but that may be because I'm not using my written German very often recently

    One of the people I've been in touch with recently by email has an 'ö' in his name, which in his email address is replaced by an 'oe'. In this case, it would be useful if searching for the version with 'ö' would also find all versions where the 'ö' is one of {'Ö', 'OE', 'ö', and 'oe'}.

    (In this case, the name is approximately 2km long, so just searching for the first 25% still yields mostly unique results.)



  • @Rhywden said in WTF Bites:

    Riiiiiight. The "I'm a genius!I have learned German at some point" argument.

    FTFY



  • @ixvedeusi said in WTF Bites:

    @Rhywden said in WTF Bites:

    Riiiiiight. The "I'm a genius!I have learned German at some point" argument.

    FTFY

    Pretty much any German teacher I know (and pretty much all of the ones I don't know) will laugh at you with your notion that "AE" should be considered equivalent to "Ä".

    It is my mother language, after all. Maybe I know a bit more about it than you do.

    That notion of equivalency is only existing in your head due to the limitations to the standard ASCII set.



  • @ixvedeusi said in WTF Bites:

    Anyway, this is an example of why case insensitivity is far from trivial and entirely locale-dependent.

    Also, I'd like to remind all the "case is easy" guys here that no one has addressed the case of the ß as of yet. Please not that the uppercase ß letter is AFAIK a pure fabrication and not actually used in practice.



  • @ixvedeusi said in WTF Bites:

    @ixvedeusi said in WTF Bites:

    Anyway, this is an example of why case insensitivity is far from trivial and entirely locale-dependent.

    Also, I'd like to remind all the "case is easy" guys here that no one has addressed the case of the ß as of yet. Please not that the uppercase ß letter is AFAIK a pure fabrication and not actually used in practice.

    You obviously have never seen a Personalausweis for someone named "Margarete Weiß".



  • @Rhywden said in WTF Bites:

    You obviously have never seen a Personalausweis for someone named "Margarete Weiß".

    No I haven't, so what? I'd expect that to be a lowerase ß, and in uppercase that would be MARGARETE WEISS. I have certainly never seen anything spelled STRAßE for example...


  • :belt_onion:

    @cheong said in WTF Bites:

    And there are "token" types that could be counted as permanent.

    Yeah, but the tokens are one of those types as well.

    I knew I shouldn't have said that second sentence, someone would :pendant: me :)


  • area_deu

    @ixvedeusi said in WTF Bites:

    I have learned German at some point

    @ixvedeusi Yes, well, so have I (kind of hard not to, being German and all). And I still say that the notion of treating "ae" equivalent to "ä" is wrong, and shouldn't be used that way. Not anymore, at least. I do agree that ß could be a problem, but I don't know why you keep insisting that ae/ä/AE/Ä is any kind of problem...



  • @ixvedeusi said in WTF Bites:

    And yes, Ä is a perfectly valid uppercase ä, but so is AE. If your algorithm doesn't get that, it's wrong.

    No, it isn't. AE is alternate spelling, not an upper-case variant. Yes, it is absolutely reasonable for search to find either, but it is not a matter of case sensitivity. By including this irrelevant example you've unfortunately sidetracked the discussion away from the actual issues (mainly with Turkic languages and the dotless-ı and İ-with-dot-above they use and that mix with normal I/i).



  • @Bulb said in WTF Bites:

    No, it isn't. AE is alternate spelling, not an upper-case variant.

    You're right about that. Happy to finally hear someone actually making a valid argument to refute me rather than just going "nanana not a problem because my text editor thinks you're wrong".

    Probably mixed it up because for me this is a similar kind of issue as case sensitivity: different sequences of letters may or may not represent the same word, and the question if they do or not is non-trivial and entirely dependent on locale.



  • @ixvedeusi said in WTF Bites:

    @Rhywden said in WTF Bites:

    You obviously have never seen a Personalausweis for someone named "Margarete Weiß".

    No I haven't, so what? I'd expect that to be a lowerase ß, and in uppercase that would be MARGARETE WEISS. I have certainly never seen anything spelled STRAßE for example...

    It's actually "MARGARETE WEIß". And, yes, there are street signs using "STRAßE":



  • @ixvedeusi said in WTF Bites:

    @Bulb said in WTF Bites:

    No, it isn't. AE is alternate spelling, not an upper-case variant.

    You're right about that. Happy to finally hear someone actually making a valid argument to refute me rather than just going "nanana not a problem because my text editor thinks you're wrong".

    Jesus Christ, I told you that repeatedly.


  • Discourse touched me in a no-no place

    So, have we at least established this stuff is complicated?

    Right. Now, given that it is complicated, why are we expecting the operating system kernel to deal with all those gnarly bits for you?


  • area_deu

    @dkf I think the fact alone that we could have a long argument about it shows that it is complicated. Then again, this is WTDWTF... Scratch that XD



  • Just tossing this in here:
    From Wikipedia: German orthography: Umlaut diacritic usage (emphasis mine):

    The diacritic letters ä, ö and ü are used to indicate the presence of umlauts (frontalizations of back vowels). Before the introduction of the printing press, frontalization was indicated by placing an e after the back vowel to be modified, but German printers developed the space-saving typographical convention of replacing the full e with a small version placed above the vowel to be modified. In German Kurrent writing, the superscripted e was simplified to two vertical dashes, which have further been reduced to dots in both handwriting and German typesetting. Although the two dots of umlaut look like those in the diaeresis (trema), the two have different functions.


  • Winner of the 2016 Presidential Election

    @dkf said in WTF Bites:

    There might be cases where there are significant locale changes within a country without changing the language, but I can't think of any off the top of my head.

    Belgium?

    @dkf said in WTF Bites:

    The OS has to figure out exactly what language and writing system a string is in. While being aware that strings can be in several different languages at once. (Yes, that's very very possible.)

    I propose a new unicode block (-set?): language codes (which are, of course, non-printing) to specify what language the string is in up to either (the end) or (the next language code). Easy! Now just to convert all the legacy stuff. And we should probably do some variations or modifiers for linguistic time period and dialect.

    @Sumireko said in WTF Bites:

    Also, strings as array indexes... for reasons.

    JavaScript support? (Or is it that JavaScript contracted MUMPS?)

    @Rhywden said in WTF Bites:

    @Yamikuronue Riiiiiight. The "I'm a genius!" argument.

    It may also be the "they're all idiots" argument. Or possibly "X is hard, it's not surprising a lot of people made the same mistake." Or even other variations that don't amount to "I'm a genius!" but rather "this mistake is common or not surprising". "I'm a little teapot❄" isn't really the only reasonable interpretation.



  • @Dreikin said in WTF Bites:

    I propose a new unicode block (-set?): language codes (which are, of course, non-printing) to specify what language the string is in up to either (the end) or (the next language code). Easy! Now just to convert all the legacy stuff. And we should probably do some variations or modifiers for linguistic time period and dialect.


  • area_can

    @dkf said in WTF Bites:

    So, have we at least established this stuff is complicated?

    Right. Now, given that it is complicated, why are we expecting the operating system kernel to deal with all those gnarly bits for you?

    It's easy, just implement it for English and German and we can i18n the rest


  • BINNED

    @dkf said in WTF Bites:

    So, have we at least established this stuff is complicated?

    Right. Now, given that it is complicated, why are we expecting the operating system kernel to deal with all those gnarly bits for you?

    It should not! Only if all these tiny European countries could learn to use proper ASCII. </blackey>



  • @dkf said in WTF Bites:

    There might be cases where there are significant locale changes within a country without changing the language, but I can't think of any off the top of my head.

    As far as I can tell, they are always considered a change of language. IIRC RFC5646 only has region tags for countries and some larger areas (so you can specify Spanish, Latin American), but no smaller areas.

    @Dreikin said in WTF Bites:

    Belgium?

    Belgium simply has two languages.

    In a sense Yugoslavia used to be the case, but only if you consider Serbo-Croatian a single language that is written in two different scripts. But the locale encoding has two different language tags for it, rs for Serbian (written in Cyrillic) and hr for Croatian (written in Latin).

    The only other case I can think of is Chinese. Unless you follow the obviously inconsistent-with-reality "One China Policy" and consider Taiwan a part of PRC, the two scripts are still only used in different countries, but in this case the default RFC5646 encodings are zh-Hant (Chinese, traditional) and zh-Hans (Chinese, simplified) rather than zh-TW (Chinese, Taiwan) and zh-CN (Chinese, China) (possibly exactly to avoid upsetting the "One China Policy" proponents).

    It should also be noted that CJK script(s) don't have case distinction, so the case is not relevant to the discussion at hand. It does have the distinction between traditional and simplified though (which were not unified in the big Unicode CJK unification), so perhaps a case-insensitive system should also be Chinese-simplification-insensitive. If it is even possible, that is...


  • BINNED

    @Bulb said in WTF Bites:

    Belgium simply has two languages.

    Wrong.

    Belgium has officially three recognized language communities: Dutch, French & German.
    Yes the third one is small and not completely on the same par but the German speaking community is defined in the constitution.

    Also 'Simply' does in no way describe the Belgian language situation.


  • Discourse touched me in a no-no place

    @Dreikin said in WTF Bites:

    @dkf said in WTF Bites:

    There might be cases where there are significant locale changes within a country without changing the language, but I can't think of any off the top of my head.

    Belgium?

    That's a language change.



  • Replacing umlauts by E's in German is like dropping diacritrics in uppercase in French (a workaround to compensate for a technical limitation). The difference is that we French are still stuck with crappy keyboards that don't cover all the characters needed to write our language (insert rant about how crappy the French layout is).


  • area_pol

    Maybe instead remove the uppercase characters from English? They don't convey enough information to justify the effort.
    The graphical effect can be kept with formatting, like bold/italics etc.



  • @Rhywden said in Case (in)?sensitive filesystems are :doing_it_wrong::

    @ixvedeusi said in WTF Bites:

    @Rhywden said in WTF Bites:

    You obviously have never seen a Personalausweis for someone named "Margarete Weiß".

    No I haven't, so what? I'd expect that to be a lowerase ß, and in uppercase that would be MARGARETE WEISS. I have certainly never seen anything spelled STRAßE for example...

    It's actually "MARGARETE WEIß". And, yes, there are street signs using "STRAßE":

    That sign is just wrong, using a fantasy letter.

    Moreover, it doesn't matter. If it is too confusing if you can have two files, Hans.jpg and HANS.jpg, in the same directory, then it is also just as confusing if you can have MÜHLFELDSTRASSE.jpg and MUEHLFELDSTRAßE.jpg in the same directory, because they are just as much only a varying representation of the same text. In German.

    And Turkish has been mentioned already too, where the uppercase of i is İ and the lower case of I is ı, and where, thus, i and I are two completely different letters. So in Turkish, ISTANBUL and istanbul are not just upper- and lowercase variants of the same.



  • @Bulb said in Case (in)?sensitive filesystems are :doing_it_wrong::

    @dkf said in WTF Bites:
    It should also be noted that CJK script(s) don't have case distinction, so the case is not relevant to the discussion at hand. It does have the distinction between traditional and simplified though (which were not unified in the big Unicode CJK unification), so perhaps a case-insensitive system should also be Chinese-simplification-insensitive. If it is even possible, that is...

    AFAIK Wikipedia provides on-the-fly translation between traditional and simplified (and HK, Taiwan and Macau variants) so I think it should be possible.

    But I think this whole case-insensitive-thing is a box of pandora.



  • @Luhmann said in Case (in)?sensitive filesystems are :doing_it_wrong::

    Also 'Simply' does in no way describe the Belgian language situation.

    The only "simple" thing about Belgium is its people.

    Filed under: In the pejorative sense :trollface:


  • FoxDev

    @HardwareGeek said in Case (in)?sensitive filesystems are :doing_it_wrong::

    > make thing
    Nothing happens ...
    > make clean
    rm -rf some/path/xyz
    > make thing
    Nothing happens ...
    ls some/path
    XYZ
    :facepalm:
    > rm -rf some/path/XYZ
    > make thing
    Thing gets made.

    Edit: No, the broken rule in the Makefile is not my doing.

    hmm... looks like someone has been too long coddled by the world of "do what i mean, not what i say" philosophy that permeates the microsloth conspiracle operating cistern that is known as "win-duhs". Rejoice therefore that you have now been raised from your slumber to experience the full glory of case sensitive file systems and all their multitudinously awesome splendor! Rise up and take on the mantle of humanity, be no longer a sheeple, but a Man/Woman/Other[delete whichever are inappropriate]!


  • :belt_onion:

    @Grunnen said in Case (in)?sensitive filesystems are :doing_it_wrong::

    If it is too confusing if you can have two files, Hans.jpg and HANS.jpg, in the same directory, then it is also just as confusing if you can have MÜHLFELDSTRASSE.jpg and MUEHLFELDSTRAßE.jpg in the same directory, because they are just as much only a varying representation of the same text. In German.

    It's not just as confusing. It's just as confusing as having COLOR.jpg and COLOUR.jpg in the same directory, and no one cares about that confusion.



  • @accalia said in Case (in)?sensitive filesystems are :doing_it_wrong::

    looks like someone has been too long coddled... Rejoice therefore that you have now been raised from your slumber to experience the full glory of case sensitive file systems

    I would just like to point out here that I am not that "someone," and I have not just been raised from my slumber. That error has been lurking in the bowels of our build system since time immemorial.

    Edit: That broke ... interestingly. No :thing: in <abbr> titles...



  • @heterodox Okay. Depending on your region, either "color" or "colour" is correct and the other one is a misspelling.

    But your nitpicking doesn't solve anything. Using that principle, "Straße" and "STRASSE" should be equivalent to each other in a case-insensitive comparison, but both of them should then not be equal to "Strasse" (which is wrong, except for Swiss German) or "STRAßE" (plain wrong).


  • Impossible Mission - B

    @Tsaukpaetra said in Case (in)?sensitive filesystems are :doing_it_wrong::

    Yeah. and very difficult to tell when Windows hides the file "extensions".

    Anyone who knows enough to about that, and doesn't immediately fix it the first time they open Windows Explorer on a new computer, is :doing_it_wrong:

    @dkf said in Case (in)?sensitive filesystems are :doing_it_wrong::

    So, have we at least established this stuff is complicated?

    Right. Now, given that it is complicated, why are we expecting the operating system kernel to deal with all those gnarly bits for you?

    Maybe because dealing with complicated stuff for us, in an automated, reliable, repeatable manner, is literally the entire point of using a computer at all?


  • FoxDev

    @HardwareGeek said in Case (in)?sensitive filesystems are :doing_it_wrong::

    That error has been lurking in the bowels of our build system since time immemorial.

    aaah. yesssssss. build systems.... no one person cna fully comprehend the horror that goes into a fully functional build system, at least not while remaining sane.


  • :belt_onion:

    @Grunnen said in Case (in)?sensitive filesystems are :doing_it_wrong::

    @heterodox Okay. Depending on your region, either "color" or "colour" is correct and the other one is a misspelling.

    "Color" is preferred in American English but "colour" is not a misspelling. But if you disagree with me, I'm simply going to agree to disagree as there are a million voices on both sides of that debate and there's no single authority over the English language. I think you understand my point.

    Using that principle, "Straße" and "STRASSE" should be equivalent to each other in a case-insensitive comparison

    Never mind. You clearly don't; how did you get that from what I said?!



  • @Grunnen said in Case (in)?sensitive filesystems are :doing_it_wrong::

    That sign is just wrong, using a fantasy letter.

    You're welcome to tell our government that they're defining the German language and lettering wrong. What do they know about it, after all, they're merely speaking it since childhood.


  • BINNED

    @HardwareGeek said in Case (in)?sensitive filesystems are :doing_it_wrong::

    The only "simple" thing about Belgium is its people.

    Klootzak


  • Impossible Mission - B

    @heterodox said in Case (in)?sensitive filesystems are :doing_it_wrong::

    "Color" is preferred in American English but "colour" is not a misspelling.

    Is it pronounced with a U? In Britain or anywhere else?



  • @Rhywden said in Case (in)?sensitive filesystems are :doing_it_wrong::

    That reminds me: one of the most amusing things I learned when visiting Germany is that the cows there have German accents. German cows really truly do say "Müh", while Australian cows say "Moo". They're quite distinct sounds, not mere transcription artifacts.



  • @flabdablet Wut? Our cows say "Muh".

    Okay, maybe you were visiting Bavaria which is different because it's practically a Foreign Country.



  • @masonwheeler said in Case (in)?sensitive filesystems are :doing_it_wrong::

    Is it pronounced with a U? In Britain or anywhere else?

    What the hell does pronunciation have to do with case sensitivity or variant spellings? Also, English is among the worst languages for its spelling-pronunciation relationship (right up there with French).

    @Rhywden After some trips to Bavaria, I still wonder how anyone can understand Bavarians when they speak (yes, including other Bavarians).


  • :belt_onion:

    @masonwheeler said in Case (in)?sensitive filesystems are :doing_it_wrong::

    Is it pronounced with a U? In Britain or anywhere else?

    Not to my knowledge.

    @Khudzlin said in Case (in)?sensitive filesystems are :doing_it_wrong::

    What the hell does pronunciation have to do with case sensitivity or variant spellings?

    Also this.



  • @Khudzlin said in Case (in)?sensitive filesystems are :doing_it_wrong::

    @masonwheeler said in Case (in)?sensitive filesystems are :doing_it_wrong::

    Is it pronounced with a U? In Britain or anywhere else?

    What the hell does pronunciation have to do with case sensitivity or variant spellings? Also, English is among the worst languages for its spelling-pronunciation relationship (right up there with French).

    I like Finnish for that - easy to pronounce, at least when you've managed not to stumble over your tongue due to the long-ass words.

    Also, they made up for the ease of pronounciation in the grammar department (the horror!).



  • @Khudzlin said in Case (in)?sensitive filesystems are :doing_it_wrong::

    @Rhywden After some trips to Bavaria, I still wonder how anyone can understand Bavarians when they speak (yes, including other Bavarians).

    Told you. Foreign country.



  • @Rhywden said in Case (in)?sensitive filesystems are :doing_it_wrong::

    Our cows say "Muh".

    Please pass on my sincerest apology for misspelling its remarks to your nearest cow.



  • @Rhywden

    § 25 (..)
    E2: Steht der Buchstabe ß nicht zur Verfügung, so schreibt man ss. In der
    Schweiz kann man immer ss schreiben. Beispiel: Straße – Strasse
    E3: Bei Schreibung mit Großbuchstaben schreibt man SS, zum Beispiel:
    Straße – STRASSE


Log in to reply