I hate my ex-colleagues



  •  The disadvantage of working in the same place for over 12 years now, is that some of your colleagues depart for greener pastures, and leave behind memories – and old code.

    In one particular application (which is a home-built orchestration layer), certain objects are temporarily stored, because they would be used soon after, and this saves a round-trip to the database. So far, so good.

    However, the WTF is as follows:

    • There are two classes, Customer and Sim (those little cards you put in your phone). Both are stored in the same object cache.
    • Both classes have a unique identifier (a customer ID and the IMSI, respectively), but instead hashCode() is used to index them in a Hashtable (obviously not a Map).

    The method hashCode() that every Java object has, returns a 32-bits integer. It's usually unique, but deinitely not necessarily so.

    And then we're wondering why we get discrepancies...



  • Well, obviously this is correct; he wasn't storing it in a UniqueIdentifierTable, now, was he?



  • Looks like an implementation error in the hashmap, a correct one works even if the hash function returns the same value for ever single element.



  • You've got it the wrong way round.



    You are THEIR ex-colleague, and they left for greener pastures to get away from you.



  • @henke37 said:

    Looks like an implementation error in the hashmap, a correct one works even if the hash function returns the same value for ever single element.
     

    The OP actually didn't say that, but he could have been clearer:

    @Severity One said:

    Both classes have a unique identifier (a customer ID and the IMSI, respectively), but instead hashCode() is used to index them in a Hashtable (obviously not a Map).

    The way I read it (after reading that line thrice), they are using a hashCode() as a key in the map / hashtable / whatever. So no duplicates, because every hash collision gets an element thrown out...


  • Discourse touched me in a no-no place

    @henke37 said:

    Looks like an implementation error in the hashmap, a correct one works even if the hash function returns the same value for ever single element.
    Hashes are never guaranteed to be unique (and that's a key property, as it lets them distill the information) so mappings that use them also need to use an equality test. Or you have to write a perfect hash function, which is very hard without deep knowledge of the data to be stored so people don't bother.



  • That's cute. I'm currently fixing an issue that was caused by a current colleague, who's no longer responsible for the poor decisions he made because he's on another project. Poor decisions being, "let's make table A with columns nvarchar(80) and table B with columns varchar(50) and then write a business-critical process that copies data from A to B". The only reason it worked for 3 years without breaking is because the data in A never happened to be longer than 50 characters.

    The best part is that I pointed out this issue at the time of development as I was the QA'er, but I was overridden by the client who "needed it to go live now" and management who doesn't care about preventing bugs, only about keeping the client happy. So guess who's fixing the bug now because it's causing things to break? Not the original developer. And I can't cuss out the original dev because he's a golden boy who manages to stay employed despite being physically present at the office for 4 hours a day, and mentally present for zero hours.



  • @dkf said:

    @henke37 said:
    Looks like an implementation error in the hashmap, a correct one works even if the hash function returns the same value for ever single element.
    Hashes are never guaranteed to be unique (and that's a key property, as it lets them distill the information) so mappings that use them also need to use an equality test. Or you have to write a perfect hash function, which is very hard without deep knowledge of the data to be stored so people don't bother.

    function hash(x)
      return x
    end

    ...and if what you're hashing is customer and SIM IDs, this may well be the best method.



  • @henke37 said:

    Looks like an implementation error in the hashmap, a correct one works even if the hash function returns the same value for ever single element.
    Although, not quite clear, I understood the OP to mean hashcode() was used as the key for the the hashtable entries.You aren't thinking twisted enough.



  • @JBert said:

    So no duplicates, because every hash collision gets an element thrown out...

     

     ??? Really? That's new to me.

    Hash tables are usually implemented so that every key node has a linked list, to support duplicate hashes. Hashtable is a synonym for map in most the languages I've looked them up in. (Java, C++, C#).

    I think what the OP is complaining about, is that the objects already have a unique identifier, so if hash returned that id instead, they'd have no collisions. But that's assuming a lot. If the number of objects were greater than the hashtable's size, then the object's position would be (id % hashtable.maxsize), and you'd be guaranteed collisions.To be guaranteed to have no collisions, you have to have the following.

    • The IDs are unique.
    • The IDs are sequential.
    • You set your hash to the same maxsize as count of objects.
    • The hashobject returns the unique ID.

    So the real WTF is using hashtable at all. Instead, if you know the count of objects you have and it will not change, then just use a indexable array. This is assuming the IDs are sequential. If the IDs are not sequential, then there's absolutely nothing from the OP I can defend

     



  • @xaade said:

    @JBert said:

    So no duplicates, because every hash collision gets an element thrown out...

     

     ??? Really? That's new to me.

    Hash tables are usually implemented so that every key node has a linked list, to support duplicate hashes. Hashtable is a synonym for map in most the languages I've looked them up in. (Java, C++, C#).

    I think what the OP is complaining about, is that the objects already have a unique identifier, so if hash returned that id instead, they'd have no collisions. But that's assuming a lot. If the number of objects were greater than the hashtable's size, then
    the object's position would be (id % hashtable.maxsize), and you'd be
    guaranteed collisions.To be guaranteed to have no collisions, you have to have the following.

    • The IDs are unique.
    • The IDs are sequential.
    • You set your hash to the same maxsize as count of objects.
    • The hashobject returns the unique ID.

    So the real WTF is using hashtable at all. Instead, if you know the count of objects you have and it will not change, then just use a indexable array. This is assuming the IDs are sequential. If the IDs are not sequential, then there's absolutely nothing from the OP I can defend

    Since you don't seem to be getting it, please tell me the result of this C# code:

    var dict = new Dictionary<int, MyClass>();

    dict.Add(myClass1.GetHashCode(), myClass1);

    dict.Add(myClass2.GetHashCode(), myClass2);

    class MyClass

    {

       public override GetHashCode() { return 4; } // Chosen by fair roll of die.  Guaranteed to be random

    }



  • Again, the WTF is likely not the hashCode() implementation. What I think that happend is that they wrote some piece of code like this:

    cache.put(customer.hashCode(), customer);

     

     instead of

    cache.put(customer.getId(), customer);

     

    The same problem would exist if they used an indexable array, then searched that array based on the output of hashCode().

     Now where's the OP when you need him?

     

    EDIT: Did we get trolled?



  • @JBert said:

    Now where's the OP when you need him?
    Working, for a change.

    It's indeed as some of you have already figured out. Instead of using the unique identifier as a key in a Hashtable, the guy used the hashCode() of the object that contains, among other things, this unique identifier.

    As a little bonus, two kinds of classes are stored in this structure, both indexed by hashCode().

    But this whole thing is a world of hurt. These were "developers" (and I'm using the word loosely) that had found the silver bullet for error conditions, return codes and whatever else you might want to convery: java.lang.Exception.

    Object not found? Throw Exception. Database crashed and burned? Throw Exception. Logic error in your code? Throw Exception. Universe is coming to an end? Throw Exception.

     



  • @eViLegion said:

    You've got it the wrong way round.

    You are THEIR ex-colleague, and they left for greener pastures to get away from you.
    Hey, I never denied that. :)


  • Discourse touched me in a no-no place

    @xaade said:

    This is assuming the IDs are sequential. If the IDs are not sequential, then there's absolutely nothing from the OP I can defend

    .

    I'm sorry - you're expecting IMSI's to be sequential? Do you even know what the first 5 or 6 digits of an IMSI represent? Thought not. (The fact that IMSIs are being used rather than the ICCIDs is another WTF in itself.)



  •  I think, that lot of fun could arise from constructions like following (not java programmer, so just idea):

     

    SimType ReturnCachedSim(SimID IMSI):
      Simtype sim;
      for sim in MyCache.findAll(IMSI):
        if sim.IMSI == IMSI: // OOPS CustomerType does not have IMSI field :(
            return sim
      return NULL;

  • Discourse touched me in a no-no place

    @xaade said:

    Hash tables are usually implemented so that every key node has a linked list
    Spend some time reading the literature, grasshopper.@xaade said:

    To be guaranteed to have no collisions, you have to have the following.

    • The IDs are unique.
    • The IDs are sequential.
    • You set your hash to the same maxsize as count of objects.
    • The hashobject returns the unique ID.
    You're talking about perfect hashing there, and for that to work you don't need them to be sequential or for the maximum value to be the same as the number of objects. You just need the hashes to be unique with a “small” maximum value (which doesn't actually need to be all that small in these days of 64-bit machines). The real problem is generating the hashes is hard for real data in real patterns of usage. Very hard. Dealing with collisions (with linked lists, hash table rebuilding, etc.) is much easier in practice than using perfect hashing.

    I wish they'd told me this on my DS&A course years ago, but they were far more enamored of funky tree algorithms…


  • Considered Harmful

    @Sutherlands said:

    Since you don't seem to be getting it, please tell me the result of this C# code:

    var dict = new Dictionary<int, MyClass>();

    dict.Add(myClass1.GetHashCode(), myClass1);

    dict.Add(myClass2.GetHashCode(), myClass2);

    class MyClass

    {

       public override GetHashCode() { return 4; } // Chosen by fair roll of die.  Guaranteed to be random

    }


    Compiler error? You can't mix a class declaration in with your imperative code like that.



  • For anyone who's still having trouble getting it: the standard Java hashtable (which I am guessing is what was referred to by OP) can only store *one* value for each unique key. So in a HashTable<int, Object> where you use the .hashCode() to get the integer key, only one Object can be stored for each key, whether that be a Customer or a Sim.  In order to store multiple values that map to the same key, you'd have to use something like HashTable<int, List<Object>> and then access the List for a given key to add or retrieve any of the multiple values that map to that particular key.

     


  • ♿ (Parody)

    @DaveK said:

    In order to store multiple values that map to the same key, you'd have to use something like HashTable<int, List<Object>> and then access the List for a given key to add or retrieve any of the multiple values that map to that particular key.

    The inner hash map pattern!



  • @dkf said:

    Hashes are never guaranteed to be unique (and that's a key property, as it lets them distill the information) so mappings that use them also need to use an equality test. Or you have to write a perfect hash function, which is very hard without deep knowledge of the data to be stored so people don't bother.

    As hashes generally have a larger input than output (infinite input, e.g. 32-bit output) uniqueness is not just "not quaranteed", it's guaranteed that there is an infinite number of inputs that will produce the same output.



  • why must we always rehash this stuff?


  • Considered Harmful

    @dtech said:

    it's guaranteed that there is an infinite number of inputs that will produce the same output.

    Also known as the pigeonhole principle.



  • @Severity One said:

    Object not found? Throw Exception. Database crashed and burned? Throw Exception. Logic error in your code? Throw Exception. Universe is coming to an end? Throw Exception.

     

    It's a good idea to throw an exception in all those cases.  An even better idea to throw a subclass of Exception o give more contextual information, but that's splitting hairs.  In all cases, just show an error message to the user.  

    By contrast, I'm working with some legacy code that smothers all exceptions, always.  Cannot get a connection to the db? Log it and return null?  NPE!!!!!!!!!!!

     



  • @_leonardo_ said:

    @Severity One said:

    Object not found? Throw Exception. Database crashed and burned? Throw Exception. Logic error in your code? Throw Exception. Universe is coming to an end? Throw Exception.

     

    It's a good idea to throw an exception in all those cases.  An even better idea to throw a subclass of Exception o give more contextual information, but that's splitting hairs.  In all cases, just show an error message to the user.  

    By contrast, I'm working with some legacy code that smothers all exceptions, always.  Cannot get a connection to the db? Log it and return null?  NPE!!!!!!!!!!!

     

     It's Java.  Throwing Exception is a bad idea.


  • @DaveK said:

    the standard Java hashtable (which I am guessing is what was referred to by OP) can only store one value for each unique key.

    That's true of all dictionaries. Also, Java generics can only use Objects, so Hashtable<Integer, Object> would be used. As opposed to something sane, like two Hashtable<ActualID, SomeActualType>.



  • @Ben L. said:

    Also, Java generics can only use Objects
     

    Your time machine get stuck?


  • @Sutherlands said:

    @Ben L. said:

    Also, Java generics can only use Objects
     

    Your time machine get stuck?

    Wait, Java generics can use primitives now? The post I was replying to had the text HashTable<int, Object>



  •  I'm pretty sure generics could always use primitives.  The one you're replying to was an example, so he didn't have to give the real name of the class.



  • @Sutherlands said:

     I'm pretty sure generics could always use primitives.  The one you're replying to was an example, so he didn't have to give the real name of the class.


    Give this a read.



  • @Sutherlands said:

    I'm pretty sure generics could always use primitives.
    No. You can use autoboxing, but the generic class has to be declared with <Integer> and not <int>.

     



  • @PJH said:

    The fact that IMSIs are being used rather than the ICCIDs is another WTF in itself.
    Well, no. The Sim object represents a mobile phone, rather than a piece of plastic with a chip on it. When assigning services, or provisioning things on the switch, we don't really care about the ICCID. (Which I had to look up; we call it the serial number.) The Sim object also keeps things like PIN and PUK codes, so in that sense it's an image of that piece of plastic.

    You could argue that it's a misnomer, but trust me, that's the least of the problem of that database and its API. Not having any constraints, not even primary keys, comes to mind as being a bigger problem.


  • Discourse touched me in a no-no place

    @Severity One said:

    @PJH said:
    The fact that IMSIs are being used rather than the ICCIDs is another WTF in itself.
    Well, no.
    Well, yes. If you ever enter the Japanese market, for example, you're going to come up against a rather sticky problem - it's illegal (as a 3rd party) to read the IMSI - even the modems out there will refuse to give you the IMSI of the SIM it's using.

    The Sim object represents a mobile phone,
    Another WTF then. (1) The SIM can move phones - if you want to represent the phone, you should be using the IMEI of the phone rather than the IMSI of a SIM in it. (2) What do you do with dual/triple-sim phones?
    rather than a piece of plastic with a chip on it. When assigning services, or provisioning things on the switch, we don't really care about the ICCID. (Which I had to look up; we call it the serial number.) The Sim object also keeps things like PIN and PUK codes, so in that sense it's an image of that piece of plastic.
    The IMSI, when available, is actually protected by the PIN - the ICCID isn't.


  • @PJH said:

    @Severity One said:
    @PJH said:
    The fact that IMSIs are being used rather than the ICCIDs is another WTF in itself.
    Well, no.
    Well, yes. If you ever enter the Japanese market, for example, you're going to come up against a rather sticky problem - it's illegal (as a 3rd party) to read the IMSI - even the modems out there will refuse to give you the IMSI of the SIM it's using.
    But we're not a third party. We're a mobile operator. We distribute these SIM cards.

    The Sim object represents a mobile phone,
    Another WTF then. (1) The SIM can move phones - if you want to represent the phone, you should be using the IMEI of the phone rather than the IMSI of a SIM in it. (2) What do you do with dual/triple-sim phones?
    Fine then, it's a mobile subscription. I didn't want to go into too much area-specific jargon.



  • @Ben L. said:

    @Sutherlands said:

     I'm pretty sure generics could always use primitives.  The one you're replying to was an example, so he didn't have to give the real name of the class.


    Give this a read.
    @Severity One said:

    @Sutherlands said:

    I'm pretty sure generics could always use primitives.
    No. You can use autoboxing, but the generic class has to be declared with <Integer> and not <int>.

    Ok, yes, that's true.  What was said was very ambiguous, then, since he said "As opposed to something sane, like two Hashtable<ActualID, SomeActualType>."  How does not being able to use primitives preclude using an actual ID and type?  Each primitive has a boxed representation.


  • ♿ (Parody)

    @Sutherlands said:

    What was said was very ambiguous, then, since he said "As opposed to something sane, like two Hashtable<ActualID, SomeActualType>."  How does not being able to use primitives preclude using an actual ID and type?  Each primitive has a boxed representation.

    The point wasn't that you couldn't use an ID if it were a primitive. But you can't use the hash code of the ID as the key for the table (assuming that the hash is not a perfect hash for all ID values) . That's all. The primitive digression was just a bit of pedantic dickweedery that doesn't really bear on the main point here. The return value for Java's hashCode is an int primitive.


  • Discourse touched me in a no-no place

    @dtech said:

    As hashes generally have a larger input than output (infinite input, e.g. 32-bit output) uniqueness is not just "not quaranteed", it's guaranteed that there is an infinite number of inputs that will produce the same output.
    Unless you're using perfect hashing, which is what I mentioned in the bit you quoted. I've done perfect hashing in the past, and it is potentially super fast. However, it's also awkward as blazes; writing a bijective function from your key object space to the integers is not for the faint-of-heart. In reality, here on planet Earth, nobody bothers with that sort of thing; they use a simpler function and then hang a mechanism for resolving the collisions (often a linked list, though there are others) off the back end.


  • Considered Harmful

    @dkf said:

    @dtech said:
    As hashes generally have a larger input than output (infinite input, e.g. 32-bit output) uniqueness is not just "not quaranteed", it's guaranteed that there is an infinite number of inputs that will produce the same output.
    Unless you're using perfect hashing, which is what I mentioned in the bit you quoted. I've done perfect hashing in the past, and it is potentially super fast. However, it's also awkward as blazes; writing a bijective function from your key object space to the integers is not for the faint-of-heart. In reality, here on planet Earth, nobody bothers with that sort of thing; they use a simpler function and then hang a mechanism for resolving the collisions (often a linked list, though there are others) off the back end.

    I believe his point was that perfect hashing is only possible if the range of your output set is as-large or larger than the set of all possible inputs. You can't coerce more values into fewer values without collisions.



  • @joe.edwards said:

    You can't coerce more values into fewer values without collisions.

    Yeah you can... just swear at them louder. (This is a universal rule for making things happen.)


Log in to reply