Comparing Upper/Lower case



  • Just ran into this - pretty much says it all:

    Java:

    int compareTo(Object o) {
      SomeObject that = (SomeObject) o;
      return this.stringField.toLowerCase().toUpperCase().compareTo(that.stringField.toLowerCase().toUpperCase());
    }


  • It's necessary to absolutely force the case to change, such as when people write their posts with the CAPS LOCK key on instead of shift, which is temporary.



  • Yeah, but the WTF is, why would you toLower and then toUpper just to do a compareTo?  you're going to get the same state if you run just one of them, running both is mind-boggling.

     

    Edit:  But the WTF is, why am I posting before fully absorbing sarcasm? Or tags?



  • I get the purpose for a case-insensitive compare. The point was that the coder forced it to lower and then upper case, when one (or the other) would have sufficed.

    Edit: Gaaa - Friday 13th - my sarcasm detector is on the fritz!

     



  • @snoofle said:

    I get the purpose for a case-insensitive compare. The point was that the coder forced it to lower and then upper case, when one (or the other) would have sufficed.

    Edit: Gaaa - Friday 13th - my sarcasm detector is on the fritz!

     

     

    YESSSSS!  Not being the only one FTW :)



  • I [i]think[/i] (not certain) this is needed to handle weird Unicode characters for which there isn't a one-to-one mapping between upper-case and lower-case.  Java's compareToIgnoreCase method (which they should have used to begin with) does basically the same thing.  Of course, it wouldn't hurt to add a comment to that effect....



  • Yes, as odd as it may seem to native English speakers, not all characters will round-trip between lowercasing and uppercasing. Really, the correct approach is to call your system case-insensitive-comparison function that deals with all that for you, but that may have been a reasonable substitute for the languages they were using if the built-in function didn't have the functionality they were looking for.

    Or maybe the programmer just had no clue what they were doing. Really just as likely, but I wouldn't jump to that conclusion just based on this.



  • That's wild, but actually makes total sense now that I think about it!

    The things you learn from laughing at other people's mistakes that aren't mistakes.  :$ 



  • @pcooper said:

    Yes, as odd as it may seem to native English speakers, not all characters will round-trip between lowercasing and uppercasing. Really, the correct approach is to call your system case-insensitive-comparison function that deals with all that for you, but that may have been a reasonable substitute for the languages they were using if the built-in function didn't have the functionality they were looking for.

    Or maybe the programmer just had no clue what they were doing. Really just as likely, but I wouldn't jump to that conclusion just based on this.

    Could you give some examples of this?  He is using Java and I've never heard of such a thing.  I'm not trying to say I've heard it all, just that I'd be interested to know how this issue would relate to a managed environment like Java. 



  • @iwpg said:

    I [i]think[/i] (not certain) this is needed to handle weird Unicode characters for which there isn't a one-to-one mapping between upper-case and lower-case.  Java's compareToIgnoreCase method (which they should have used to begin with) does basically the same thing.  Of course, it wouldn't hurt to add a comment to that effect....

    A completely reasonable guess, but not the case here. This thing uses just the ASCII (albeit in Java) subset of Unicode, so a straight toUpperCase() (or toLowerCase()) on both ends would have handled it, or the compareToIgnoreCase as was also pointed out!



  • @roto said:

    @pcooper said:

    Yes, as odd as it may seem to native English speakers, not all characters will round-trip between lowercasing and uppercasing. Really, the correct approach is to call your system case-insensitive-comparison function that deals with all that for you, but that may have been a reasonable substitute for the languages they were using if the built-in function didn't have the functionality they were looking for.

    Or maybe the programmer just had no clue what they were doing. Really just as likely, but I wouldn't jump to that conclusion just based on this.

    Could you give some examples of this?  He is using Java and I've never heard of such a thing.  I'm not trying to say I've heard it all, just that I'd be interested to know how this issue would relate to a managed environment like Java. 



    One example that I know of is that the letter "ß" has an uppercase of "SS", which can have a lowercase of "ss". I believe there are others as well, but as I'm only a fluent speaker of English, I don't really know all the cases offhand. I just know that casing, comparison, and collation aren't nearly as simple as one might think at first.



  • None of this excuses him for not using equalsIgnoreCase().



  • I thought "SS" had a lowercase of "B" - as in ClaBic.



  • @Random832 said:

    I thought "SS" had a lowercase of "B" - as in ClaBic.

    I was thinking "SS" lowercased to "butt", although I can't account for the A




  • @snoofle said:

    A completely reasonable guess, but not the case here. This thing uses just the ASCII (albeit in Java) subset of Unicode


    Doesn't matter: In Turkish, the uppercase of " i " would be " İ " (U+0130) - and the lower case of " I " is " ı ".



  • @aihtdikh said:

    @Random832 said:

    I thought "SS" had a lowercase of "B" - as in ClaBic.

    I was thinking "SS" lowercased to "butt", although I can't account for the A


    The 'a' is elided. I think it's a dialectical thing.



  • @pcooper said:



    One example that I know of is that the letter "ß" has an uppercase of "SS", which can have a lowercase of "ss". I believe there are others as well

    The other ones in unicode aren't used in any modern languages (most of them are obscure ancient greek junk that nobody uses). Eszett is the bastard that makes case folding difficult. There are a number of words in German that become different words if you convert them to uppercase and then to lowercase.

    A number of solutions have been proposed, but they're all very involved. The best one so far is to cut off all internet access to Germany and pretend that they don't exist.



  • @asuffield said:

    The other ones in unicode aren't used in any modern languages (most of them are obscure ancient greek junk that nobody uses). Eszett is the bastard that makes case folding difficult. There are a number of words in German that become different words if you convert them to uppercase and then to lowercase.

    A number of solutions have been proposed, but they're all very involved. The best one so far is to cut off all internet access to Germany and pretend that they don't exist.

    You forget that us Austrians use the ß, too. ;-) The most obvious solution is to switch to Swiss German - they simply got rid of the ß and write ss in all occurences instead. I know of only one word where the meaning changes when the ß is replaced with the ss: Masse (mass) vs. Maße (dimensions)
     



  • @ammoQ said:

    I know of only one word where the meaning changes when the ß is replaced with the ss: Masse (mass) vs. Maße (dimensions)

    What's the difference in pronunciation? 



  • @Thief^ said:

    @ammoQ said:

    I know of only one word where the meaning changes when the ß is replaced with the ss: Masse (mass) vs. Maße (dimensions)

    What's the difference in pronunciation? 

    Maße is spoken with an accented a, Masse with an unaccented a.

     


Log in to reply