Too abstract?

luke727

Today I was looking at some code and was kind of confused at what I found. Let me give you an example first:

class Base
{
   public HashMap Map = new HashMap();
}
class Person : Base
{
   public const String ATTR_FIRST_NAME = "FirstName";
   public const String ATTR_LAST_NAME = "LastName";
   public const String ATTR_AGE = "Age";
}
int Main(String[] args)
{
   Person p = new Person();
   p.Map[Person.ATTR_FIRST_NAME] = "John";
   p.Map[Person.ATTR_LAST_NAME] = "Smith";
   p.Map[Person.ATTR_AGE] = 25;
}

Basically, there is a base class with a HashMap. The HashMap is used to store class variables (as opposed to explicitly declaring them). Every other class in the framework derives from the base class. To add new members to the classes you can define a constant String containing the "name" of the variable and then use the HashMap to store and retrieve it's value (or you can do it the "normal" way; many of the classes do it both ways). Obviously the real code wasn't as simple as this and some helper methods made it easier to access the class "members" (there were also mechanisms to ensure type-safety). I don't think it's quite a WTF, but I do think it is taking abstraction too far. If you are going to declare some constant String with the name of the variable to be used as an index into the HashMap, why not just declare a fucking variable? I just don't see why you would do it this way.

And now I pray as I hit "Post"...

I'm afraid I disagree with you on it being a case of taking abstraction too far, though for three properties it would be a bit silly, and accessing the map directly isn't the right way to do it. And using a statically named string that maps to the property name is a bit silly.

The moment my object hits 20 properties, if I'm working with a scripting language, I start looking at my options. If there are a large number of dynamic properties, as well, it's definitely going in this sort of data structure, because there's a good chance that those properties change. I've probably been writing perl for too long, because I imagine that it wasn't that long ago I would have squirmed to even think about writing code this way.

There are problems with creating this type of class. You're going to have to go to extra trouble to prevent anyone from simply adding a new property at any time, which if you serialize it, goes to your data store. It violates a good bit of OO design philosophy. It becomes less elegant to implement a property that calls code in the background [though, generally, not impossible]. You have to worry about how to handle a nested structure [ex: this object had an array of phone numbers], data transformations [it's stored in number of seconds since 1900, but everything expects a date], default data values [if we couldn't deserialize this property from the database, it should be 17], non-existant vs actual null data [the field was null, literally.] Nothing that can't be overcome, but like most powerful tools, you don't just throw it at every problem blindly.

If I'm in a statically-typed language, say, C++, it varies by project, but I'll often generate some code that auto-generates the class members, pre-compile, though I have been known to serialize the data into a hashtable of strings [that static data typing gets ugly to deal with here, much like when you're writing raw ODBC code, not that it can't be done].

Person actually makes an interesting example, because I could easily see the aspects you are looking at changing at runtime, through, say, a configuration option, depending on what the class is actually supposed to do. I've got something similar with a piece of code at work: we have a Customer object that has between 5 and 270 properties, which exist based on a rows in a database that gets added to every few months. [it's a good place to have closures for a lot of things, too...]

ammoQ

IMO it's a WTF. It looks like the programmer was trying to build a feature found dynamic scripting languages into C#, thus throwing away a lot of integrated features of C#. Just like the attempts to build "generic datamodels" on top of a relational database, it's unnecessary, slow and leads to subtile bugs the compiler cannot find. There is no good reason to do that.

jsmith

@luke727 said:

Today I was looking at some code and was kind of confused at what I found. Let me give you an example first:
class Base
{
   public HashMap Map = new HashMap();
}
class Person : Base
{
   public const String ATTR_FIRST_NAME = "FirstName";
   public const String ATTR_LAST_NAME = "LastName";
   public const String ATTR_AGE = "Age";
}
int Main(String[] args)
{
   Person p = new Person();
   p.Map[Person.ATTR_FIRST_NAME] = "John";
   p.Map[Person.ATTR_LAST_NAME] = "Smith";
   p.Map[Person.ATTR_AGE] = 25;
}
Basically, there is a base class with a HashMap. The HashMap is used to store class variables (as opposed to explicitly declaring them). Every other class in the framework derives from the base class. To add new members to the classes you can define a constant String containing the "name" of the variable and then use the HashMap to store and retrieve it's value (or you can do it the "normal" way; many of the classes do it both ways). Obviously the real code wasn't as simple as this and some helper methods made it easier to access the class "members" (there were also mechanisms to ensure type-safety). I don't think it's quite a WTF, but I do think it is taking abstraction too far. If you are going to declare some constant String with the name of the variable to be used as an index into the HashMap, why not just declare a fucking variable? I just don't see why you would do it this way.

And now I pray as I hit "Post"...

So, what exactly does this buy?

<FONT face="Courier New">p.FirstName = "John";</FONT>

turns into:

<FONT face="Courier New">p.Map[Person.ATTR_FIRST_NAME] = "John";</FONT>

Minus one for the new class heirarchy, more cumbersome to use than the traditional way. How about performance? The traditional method makes one virtual call to a setter. The new way gets a virtual references to the Map, then calls the indexer, passing an instance member (one call to get the value and another to pass it in, at least it won't be virtual since it's a const), then the indexer does a HashMap lookup (lets be generous and assume that the HashMap is implemented so well that it takes zero time). Score two for the traditional way.

So, it looks like there are no procedural advantages of this, maybe it's better in other ways. I've written a lot of code in a less-than-direct manner to get extensibility and maintainability. Well, type safety is the first thing I'd worry about. Since the HashMap will take anything, any mechanism they have for type safety has to be quite interesting. The best they probably did was throw an exception if the wrong type was used. That moves the error detection from compile time to run time -- very bad. I also can't see any extensibility coming out of this that you couldn't get easier with interfaces.

The only reason I could see this being invented was to allow cowboy programming back in the COM days. This would make the public interface of everything stay the same forever, allowing the developers to always use binary compatibility when recompiling. A little bit of forethough and the liberal use of interfaces would work just as well as this monstrousity even back then. With Java and C#, it's all irrelevant.

BTW, why would ATTR_FIRST_NAME be a string? It doesn't really matter what type it is, why use the one that has all sorts of weird comparison issues? If this is Java, then the only reason this code even works is compiler optimization. Fortunately, the Java compiler will make all of those ATTR_FIRST_NAME fields point to the same instance and allow the HashMap lookups to actually work. If it's C#, then the only reason it work is because C# treats string comparisons specially and does value comparisons (which are fairly slow).

It buys nothing by the usage they're implying, because hardcoding that string inside the class is dumb. But the fact that they're trying to make a class with dynamic properties isn't what makes it dumb. The way this would be closer to useful would be something closer to

p.Map["First_Name"] = "John";
p.Map["Last_Name"] = "Smith";
Which is a little closer to why you would use this. A good example would be if, at runtime, we
need to be able to add new properties like:

p.Map["Pet_Name"] = "Fluffy";
... and there are hundreds or thousands of those types of properties. I have two working examples: one is an object that, amoung other things, must enumerate Active Directory properties. If you've worked with AD much, you know that each object has two schemas defined: a mandatory and an optional. Many of the items in the optional schema are not known about until runtime. My object makes no distiction [to the user, there is none], it gets every property that exists on the binding to the LDAP address, and allows the user to query those. As well, there is special processing done, based on a closure that may be passed in on creation. That closure can access elements that could not have been part of the base class using a standard interface. Example in perl:
my $AD_Object = new ADObject(Path => "LDAP://shadowrise.net/cn=TheDauthi",
Call => sub { my $ad = shift; print $ad["PagerNumber"];});

Which binds to that CN and simply prints out the PagerNumber field... which generally only exists if you have an exchange server. (This example is worse than useless: perl objects are "blessed" versions of a hashtable anyway)
The useful version of this involves a large set of functions loaded at runtime that, among other things, are dynamically added to an array to be called in order on an array of several hundred objects (about 90 functions per, on average), which add data to, parse, trim and modify the objects. This implies that the closures must have a standard function interface... and if the data that is being modified is the only data that defines that object, but the object is required for the methods defined for it, this seems a rather natural way to code it. I know that C# can add and delete properties with reflection at runtime, but I wasn't sure about the performance of 1000s of adds and deletes, among other, more implementation-specific reasons.
You do indeed lose your type checking unless you implement some special code in the [] method (which I do).
Eh, I've been known to be wrong. Everyone else is probably right, this is probably completely useless and wrong. It has worked for my cases, though.

Whiskey_Tango_Foxtro

Ok, first of all, the code is 100% WTF. Why are they using a HashMap (which is Java), and capitalized String classes (also mostly Java), but using C# syntax for declaring inheritance and C# Main()? Make up your mind, bub! Also, I agree with ammoq that hard-coding the key names in your child class is an insanely stupid WTF.

*However*, I have no problems with using a Hashmap/Hashtable for local variable storage, in that it gives you the run-time ability to add attributes to a class. Essentially adding scripting capabilities to Java or C# (whichever code this is supposed to be). Python, for example, does exactly this for all of its object variable storage. Every object is essentially a hashtable, and syntax such as foo.bar = "baz" translates automatically into foo["bar"] = "baz". Ok, actually it translates into foo.__setattr__("bar", "baz"), which in turn uses foo.__dict__["bar"] = "baz".

That said, python is a dynamically typeed and casted language, whereas Java and C# are statically typed. To use a hashtable in a similar way to the way Python works, a lot of runtime casting has to be done explicitly.

ammoQ

@Whiskey Tango Foxtrot Over said:

However, I have no problems with using a Hashmap/Hashtable for local variable storage, in that it gives you the run-time ability to add attributes to a class. Essentially adding scripting capabilities to Java or C# (whichever code this is supposed to be).

Yes, there a perfect valid reasons to do that. Keeping session context and stuff. A container for various data to "throw over the wall" (though in many cases, reflection is a better way to do that).

maniac78

Complete and utter trash. Why even bother making a class? Just use a freaking hashtable/map and one huge ass constants file or something. Hell write the whole thing using hashmaps instead of classes. Why not go all the way?

makomk

@jsmith said:

BTW, why would ATTR_FIRST_NAME be a string? It doesn't really matter what type it is, why use the one that has all sorts of weird comparison issues? If this is Java, then the only reason this code even works is compiler optimization. Fortunately, the Java compiler will make all of those ATTR_FIRST_NAME fields point to the same instance and allow the HashMap lookups to actually work. If it's C#, then the only reason it work is because C# treats string comparisons specially and does value comparisons (which are fairly slow).

WTF? Isn't the whole point of HashMaps that they compare two keys as equal if they return the same hashCode() and .equals() returns true? (Both of which should happen if the strings contain identical text, even if they're different objects).