HashMap "optimization"



  • Here's some code that made me go "WTF?" out loud when I found it in a project I'm working on (heavily condensed, cleaned up and anonymized, so a lot of minor collateral WTFs have already been eliminated):

    public class OptimizedHashMap {
    private HashMap<String[ ], String> theMap;
    private String[ ­][­ ] theKeys;

    public OptimizedHashMap() {
    theMap = new HashMap<String[ ], String>();
    // the code that fills the map is omitted for brevity
    theKeys = theMap.keySet().toArray(new String[theMap.size()][ ]);
    }

    public String getValueFromMap(String[ ] forKey) {
    String[ ] key;
    for(int i = 0; i < theKeys.length; i++) {
    key = theKeys[i];
    if(Arrays.equals(forKey, key)) {
    return theMap.get(key);
    }
    }
    return null;
    }
    }

    The dubious wisdom of using arrays of strings as keys left aside, this display of the utter lack of understanding how a hashmap works makes me wonder how anyone who apparently possesses enough rudimentary intelligence to operate a complex electronic device (i.e. a computer), start editor application and type that junk, can do so without getting the irresistible urge to hit himself over the head repeatedly with a blunt and heavy object. *sigh*



  • I don't get it either.

    Perhaps "optimized" had a different context/meaning to the original programmer coder.



  • It's not my intent to legitimize this in any way, but I do see a possible explanation.

    One has to remember that retrieving a value from a Java HashMap involves using the key's equals() method. The equals() method of a Java array is the same as Object.equals(). It uses reference equality, i.e., it determines whether the two objects are the same instance in memory. The Arrays.equals() method used in the getValueFromMap() method here performs a comparison of the invidual objects in the arrays, returning true if the two arrays are the same length and all of their contents are equal. There is no way to make a HashMap (the built-in Java class, not the general concept of a hash map) use this comparison instead of the default without wrapping it in something like this.

    So, if you are set on using arrays as keys for a map, you need to do something along these lines in order to make it possible to get a value by supplying a key that is an equivalent copy of one of the keys in the map, but is not the exact same instance stored in the map.

    I can't immediately think of any good reason why you would want to use an array as a key. Nor is this actually "optimizing" in any way.



  •  Reminded me of this conversation from Futurama

    Cubert: Hey Leela, help me apply these flame decals I got in my cereal. They'll make the ship go faster.
    Leela: And what's your scientific basis for thinking that?
    Cubert: I'm 12.

     

    How old was the original coder again?



  • @Someone You Know said:

    One has to remember that retrieving a value from a Java HashMap involves using the key's equals() method. The equals() method of a Java array is the same as Object.equals(). It uses reference equality, i.e., it determines whether the two objects are the same instance in memory. The Arrays.equals() method used in the getValueFromMap() method here performs a comparison of the invidual objects in the arrays, returning true if the two arrays are the same length and all of their contents are equal. There is no way to make a HashMap (the built-in Java class, not the general concept of a hash map) use this comparison instead of the default without wrapping it in something like this.

     

    If someone was really intent on using an array of strings for a key to a hash map, wrapping the entire HashMap into an abomination such as this one is still not necessary at all; it would be a lot more sane to create a class only for the keys, which stores the array-of-strings and overloads the hashCode() method to provide a (hopefully reasonable) combined hash for all the strings making up the key.

    That way, the map would actually serve it's purpose - that of making access to the stored values more efficient; you could just grab the desired value from the map via a single call to get(), and that would only take the computational effort for calculating the combined hash once, plus O(1) Arrays.equals() for the items in the bucket.



  • @Anonymouse said:

    @Someone You Know said:

    One has to remember that retrieving a value from a Java HashMap involves using the key's equals() method. The equals() method of a Java array is the same as Object.equals(). It uses reference equality, i.e., it determines whether the two objects are the same instance in memory. The Arrays.equals() method used in the getValueFromMap() method here performs a comparison of the invidual objects in the arrays, returning true if the two arrays are the same length and all of their contents are equal. There is no way to make a HashMap (the built-in Java class, not the general concept of a hash map) use this comparison instead of the default without wrapping it in something like this.

     

    If someone was really intent on using an array of strings for a key to a hash map, wrapping the entire HashMap into an abomination such as this one is still not necessary at all; it would be a lot more sane to create a class only for the keys, which stores the array-of-strings and overloads the hashCode() method to provide a (hopefully reasonable) combined hash for all the strings making up the key.

    That way, the map would actually serve it's purpose - that of making access to the stored values more efficient; you could just grab the desired value from the map via a single call to get(), and that would only take the computational effort for calculating the combined hash once, plus O(1) Arrays.equals() for the items in the bucket.

     

    Oh, I agree that it's awful. I was just saying that's an awful solution to a legitimate issue, since you seemed to imply that you didn't understand why it had been done.

    I don't think you necessarily need to create a new class for the keys, though. Any subclass of java.util.AbstractList will have equals() and hashCode() methods that do pretty much what you suggest. 

    I'm still curious about why you'd want to use an array or a List as a map key, though.



  • @Someone You Know said:

    I'm still curious about why you'd want to use an array or a List as a map key, though.

     

    Oh, that's simple, the code is part of an automated test, and the original coder was using the output generated by the application (each String in the array was a line of the output, each taken from a limited set of possible "output building blocks")  to try to do some kind of "reverse lookup" to deduce some aspects of the internal state of the application.

    Yes, it is as stupid as it sounds. Unfortunately, since that WTF has grown its weblike little tendrils all the way through major parts of the application, getting rid of it would be nearly impossible without doing a large-scale rewrite, and that would go way beyond the scope of the task I was entrusted with when I inherited the app (just a "minor" change to the UI, which turned out not to be so minor after all, but that's a whole other story...)



  • @Someone You Know said:

    It's not my intent to legitimize this in any way, but I do see a possible explanation.

    One has to remember that retrieving a value from a Java HashMap involves using the key's equals() method. The equals() method of a Java array is the same as Object.equals(). It uses reference equality, i.e., it determines whether the two objects are the same instance in memory. The Arrays.equals() method used in the getValueFromMap() method here performs a comparison of the invidual objects in the arrays, returning true if the two arrays are the same length and all of their contents are equal. There is no way to make a HashMap (the built-in Java class, not the general concept of a hash map) use this comparison instead of the default without wrapping it in something like this.

    So, if you are set on using arrays as keys for a map, you need to do something along these lines in order to make it possible to get a value by supplying a key that is an equivalent copy of one of the keys in the map, but is not the exact same instance stored in the map.

    I can't immediately think of any good reason why you would want to use an array as a key. Nor is this actually "optimizing" in any way.

    But, you don't have to destroy everything that makes a HashMap a HashMap in the process.  A HashMap is good because it does lookups in constant time.  This "optimised" HashMap does lookups in O(n) time.  At least call it OptimizedArrayMap and use an ArrayList under the hood instead of a HashMap.



  • @Jaime said:

    @Someone You Know said:

    It's not my intent to legitimize this in any way, but I do see a possible explanation.

    One has to remember that retrieving a value from a Java HashMap involves using the key's equals() method. The equals() method of a Java array is the same as Object.equals(). It uses reference equality, i.e., it determines whether the two objects are the same instance in memory. The Arrays.equals() method used in the getValueFromMap() method here performs a comparison of the invidual objects in the arrays, returning true if the two arrays are the same length and all of their contents are equal. There is no way to make a HashMap (the built-in Java class, not the general concept of a hash map) use this comparison instead of the default without wrapping it in something like this.

    So, if you are set on using arrays as keys for a map, you need to do something along these lines in order to make it possible to get a value by supplying a key that is an equivalent copy of one of the keys in the map, but is not the exact same instance stored in the map.

    I can't immediately think of any good reason why you would want to use an array as a key. Nor is this actually "optimizing" in any way.


    But, you don't have to destroy everything that makes a HashMap a HashMap in the process. 

    Agreed. (I did say it was awful.)

    @Jaime said:


    A HashMap is good because it does lookups in constant time.  This "optimised" HashMap does lookups in O(n) time.

    Where n is the size of the key, not the size of the map, but yes.

     



  • @Someone You Know said:

    Nor is this actually "optimizing" in any way.

    Actually, it could have returned faster for all of his tests with just a few elements in the map.  It also would return faster for all tests where he inserted a bunch of records into the map, and then only checked against one of them (or maybe even a couple).

    Pity1 any such performance testing was only accomplished on a very small scale - if at all.

    Whenever I've needed to use an array as a hash key, I've found it generally works best to concatenate the array into a string, using a separator character not present in any of the array values.  Of course, that can only be done if the array values are restricted to less than ones complete character map (whether that's 256 characters, 65536 characters, 4 billion some-odd characters, or something in between.  Of course, if it's 128 characters, there's an obvious easy answer.)  It is, however, still better to avoid the situation entirely.

    1Unless, of course, the full "hash" map size was guaranteed to be small.  I've seen one case where the map size was "guaranteed" to never exceed 3.  While its size is now sometimes as large as 6, and will likely be 7 at least once before the quarter's done, the break-even point for the hash map was at something on the order of 50 elements, so my coworker's solution is still faster.



  • @Someone You Know said:

    @Jaime said:


    A HashMap is good because it does lookups in constant time.  This "optimised" HashMap does lookups in O(n) time.

    Where n is the size of the key, not the size of the map, but yes.

    Maybe I'm missing something here, but wouldn't it actually be O(n*m), where m is the number of keys in the map and n is the size of a key? (With the simplyfying assumption that all keys were the same size)

    Also, this looks to me like the original programmer not only didn't understand hashmaps, but equals()/hasCode() not either. The way he used keySet.toArray() there looks to me like he actually wanted to copy the keys into another array - which of course would have broken the whole test for referential equality. Luckily (for him), toArray(array) does something completely different than he thought, so in the end the whole thing was working because he broke it two times.



  • @PSWorx said:

    @Someone You Know said:

    @Jaime said:


    A HashMap is good because it does lookups in constant time.  This "optimised" HashMap does lookups in O(n) time.

    Where n is the size of the key, not the size of the map, but yes.

    Maybe I'm missing something here, but wouldn't it actually be O(n*m), where m is the number of keys in the map and n is the size of a key? (With the simplyfying assumption that all keys were the same size)

    Also, this looks to me like the original programmer not only didn't understand hashmaps, but equals()/hasCode() not either. The way he used keySet.toArray() there looks to me like he actually wanted to copy the keys into another array - which of course would have broken the whole test for referential equality. Luckily (for him), toArray(array) does something completely different than he thought, so in the end the whole thing was working because he broke it two times.

    I went into this thinking it was a big list with small keys, because that's the type of problem where you use a HashMap (HashMap typically makes all keys small by hashing them).  I considered n << m and n to be nearly constant size (like any good hash is), therefore I ignored it.  What the orignal WTFer really needed to do was to create an decent hash algorithm for his arrays.  Even better, to encapsulate the array in a custom object that has overridden equals and hashCode methods.  This looks like one of those situations where someone solved every single problem, except those he actually had.

    Also, the whole purpose of the original hack was to do a value comparison instead of a reference comparison of the arrays.  So, the problem with toArray doesn't exist.  Had he done a reference comparison, my orginal O(n) would have been true.  It's the value comparison that makes it O(n*m).



  • public class OptimizedHashMap {

    Really, this has already said enough.



  • @Anonymouse said:

    Oh, that's simple, the code is part of an automated test, and the original coder was using the output generated by the application (each String in the array was a line of the output, each taken from a limited set of possible "output building blocks")  to try to do some kind of "reverse lookup" to deduce some aspects of the internal state of the application.

    Wait...so the keys will be, for instance,

    ["java.lang.IOException at line 453: Stream was null",
    " at com.foo.bar.TestClass: 132",
    " at someotherplace"]

    ?

    I mean, probably not literally just a stack trace, but that's the general gist of the keys?

    Man, when you think you've seen it all...


Log in to reply