Smoke less associative array, please



  • So, today I was going about my business, working on a fairly complex piece of reporting code that is intended to [blah blah blah irrelevant], when my dear counterpart at another site, working on a different part of the same application, has an epiphany.

    First, a little piece of back-story:

    Many moons ago, this other site saw what we were doing with this software we're writing, and wanted in. They wanted some customizations for their site, and of course, implementation of their own, separate business rules.
    So we did what any sane developer would, and adapted the application to accept more than one set of rules so you simply give a parameter at initialization, and the application magically behaves as Site2 in stead of Site1.
    In hindsight we probably should have forked, but pooling our developer resources gets things done faster and possibly even better, right?
    Their resident PHP guy, that also knows Perl since he has a Bachelor's Degree in Software Engineering, got the job.

    Since this "configuration" isn't likely to change any century soon, and it's trivial to recompile if it does (mod_perl2), it's hard-coded in a class/object.
    Anyone familiar with Perl will know that a class, or an object, or whatever you might call it, is essentially a glorified hashref, or "a pointer to an associative array".

    A business rule might look a little like this (simplified since the real stuff is classic Perl line-noise-style suff):

    {
        set => {
            databaseColumn2 => 'some value',
        },
        where => {
            databaseColumn2 => ' IS NOT NULL',
            databaseColumn3 => ' NOT IN (1,5,6)',
        },
    }
    
    As you no doubt figured out, this will be parsed into SQL statements that are executed in the SQL server. So far, so good.

    The rules they wanted for Site2 is a lot more complex than the rules we have here ate Site1, as they deal with a whole lot more data from a whole lot more sources.
    That's why I didn't really react all that much when it took a very long time for my Site2 counterpart to implement their rules, and it kept clogging up the SQL Test Server. These rules are, after all, fairly hard to implement:
    Non-technical people fill in what they think they want in an Excel sheet based on column names from another Excel sheet that very loosely matches the data in the database. We programmer-types are then left to decipher what they really want, and how to make it happen.
    (This is SOP in the software world, from what I gather, so one does what one can.)

    His problem, for the last 8-9 days, has been that rules like this won't work:

    {
        set => {
            databaseColumn127 => 'some value',
        },
        where => {
            databaseColumn2 => ' IS NOT NULL',
            databaseColumn2 => ' != 31',
            databaseColumn2 => ' != 43',
            databaseColumn3 => ' != "some condition" ',
            databaseColumn3 => ' != "some other condition" ',
            databaseColumn4 => ' > CURDATE() - INTERVAL 1 DAY',
            databaseColumn4 => ' < CURDATE()',
        },
    }
    
    It'll change stuff when databaseColumn2 is NULL, and ignore the check for "some condition" in databaseColumn3. It will even ignore the minimum date for databaseColumn4! BUT WHY!?!

    Now for my counterpart's epiphany!
    Hashes have unique keys! Use the same key twice, and it'll redefine.

    Morale of the story: Peer review a whole lot more than I do, and just because it compiles cleanly it doesn't have to run right.
    <troll>In his defense, he is a PHP-guy, and there are no hashes in PHP. </troll>



  • Wait a minute -- if he KNOWS that some of the conditions won't be , no, nevermind that way lies madness.



  • @Chewbacca said:

    <troll>In his defense, he is a PHP-guy, and there are no hashes in PHP.</troll>

    Eh?

    @Chewbacca said:

    <troll> ... </troll>

    I... but... er... damn you.



  • @Medezark said:

    if he KNOWS that some of the conditions won't be

    He sees the code, and the produced result, and can't figure out why that code produces that result.
    Until the epiphany, that is.



  • @Signature Guy said:

    Signature Guy

    Signature

    Joined on Thu, Jan 1 1970

    Forum Signature

    Posts <FONT color=#88aa88>∞</FONT>

    Eh - what?



  • @b-redeker said:

    @Signature Guy said:

    Signature Guy

    Signature

    Joined on Thu, Jan 1 1970

    Forum Signature

    Posts <FONT color=#88aa88>∞</FONT>

    Eh - what?

    Ah, the joys of community server.  With the right signature, you can create what seems to be an extra post.



  • @DescentJS said:

    Ah, the joys of community server.  With the right signature, you can create what seems to be an extra post.

    On the other hand, it's utter lack of anything resembling "input sanitation" lets me create posts like a website, for example, I could float:right that image in my last post. That's kind of cool I guess, if it was done on-purpose.



  • @DescentJS said:

    @b-redeker said:

    @Signature Guy said:

    Signature Guy

    Signature

    Joined on Thu, Jan 1 1970

    Forum Signature

    Posts <FONT color=#88aa88>∞</FONT>

    Eh - what?

    Ah, the joys of community server.  With the right signature, you can create what seems to be an extra post.

    Re: Whatever This Thread Is Named

    The funniest part^^



  • @Chewbacca said:

    His problem, for the last 8-9 days, has been that rules like this won't work:

    {
    set => {
    databaseColumn127 => 'some value',
    },
    where => {
    databaseColumn2 => ' IS NOT NULL',
    databaseColumn2 => ' != 31',
    databaseColumn2 => ' != 43',
    databaseColumn3 => ' != "some condition" ',
    databaseColumn3 => ' != "some other condition" ',
    databaseColumn4 => ' > CURDATE() - INTERVAL 1 DAY',
    databaseColumn4 => ' < CURDATE()',
    },
    }
    It'll change stuff when databaseColumn2 is NULL, and ignore the check for "some condition" in databaseColumn3. It will even ignore the minimum date for databaseColumn4! BUT WHY!?!

    Now for my counterpart's epiphany!
    Hashes have unique keys! Use the same key twice, and it'll redefine.

    Obviously, you should change the format to replace those curly brackets around the "where" part with square brackets.  That'll fix it.

    (That actually is at least a half-serious suggestion.  It will work, provided that you change the backend to accept an arrayref instead of a hashref.)



  •  Well, the API that is provided is pretty limiting.... SQL::Abstract has been around for 5 years now:

    http://search.cpan.org/~ribasushi/SQL-Abstract/lib/SQL/Abstract.pm

    DBIx::Class and SQLA quality is so impressive that I have to raise an eyebrow when something else is being used.

     



  • Honestly, I can see myself making that mistake: just following the pattern and hitting the go button.  I bet he learned a lot during those 8 days though.



  • @DescentJS said:

    @b-redeker said:

    @Signature Guy said:

    Signature Guy

    Signature

    Joined on Thu, Jan 1 1970

    Forum Signature

    Posts <FONT color=#88aa88>∞</FONT>

    Eh - what?

    Ah, the joys of community server.  With the right signature, you can create what seems to be an extra post.

     

     

    Sedn meh te codez . . . .


  • @Xyro said:

    Honestly, I can see myself making that mistake: just following the pattern and hitting the go button.  I bet he learned a lot during those 8 days though.

    Oh, definitely, been there, done that etc.

    But you hope for someone to either figure out "hey... it only takes one of them..." or else get back to you in say 8 hours and say "I can't figure this one out". Not 8 days.



  • @vyznev said:

    Obviously, you should change the format to replace those curly brackets around the "where" part with square brackets.  That'll fix it.

    (That actually is at least a half-serious suggestion.  It will work, provided that you change the backend to accept an arrayref instead of a hashref.)

    First, this is only a very small part of it. There is A LOT more to the business rules handling than the SQL part, such as priorities listing and weighing, etc etc, which is what I referred to as "classic line-noise-style Perl".
    Secondly, he had absolute and full access to change this around as he saw fit, and rearrange anything in the backend/parser to do whatever, as long as the change was either compatible with my stuff or he changed my stuff to match.
    I would have accepted a change like...

    if (ref($rule) eq 'ARRAY'){
        # Do his stuff
    }
    elsif (ref($rule) eq 'HASH'){
        # Do my stuff
    }
    

    @trwww said:
    Well, the API that is provided is pretty limiting.... SQL::Abstract has been around for 5 years now:

    DBIx::Class and SQLA quality is so impressive that I have to raise an eyebrow when something else is being used.

    I'm quite familiar with SQL::Abstract, Ima::DBI, Class::DBI and most of the other even remotely relevant stuff.
    They are all either massive performance killers or don't support the "removed for simplicity" stuff. :-)
    They are also massively beside the point. He should still have spotted why it didn't work, and if not 9 days ago, then 8.



  • @b-redeker said:



    Oh, definitely, been there, done that etc.

    But you hope for someone to either figure out "hey... it only takes one of them..." or else get back to you in say 8 hours and say "I can't figure this one out". Not 8 days.

    While hunting for bugs in a language you're not that familiar with isn't much fun, it would be nice to work with someone with the initiative to look for a debugging tool within the first week or so.



  • heh. I had forgot about Signature Guy.



  •  @Chewbacca said:

    @trwww said:
    Well, the API that is provided is pretty limiting.... SQL::Abstract has been around for 5 years now:

    DBIx::Class and SQLA quality is so impressive that I have to raise an eyebrow when something else is being used.

    I'm quite familiar with SQL::Abstract, Ima::DBI, Class::DBI and most of the other even remotely relevant stuff.
    They are all either massive performance killers or don't support the "removed for simplicity" stuff. :-)
    They are also massively beside the point. He should still have spotted why it didn't work, and if not 9 days ago, then 8.

    Yeah you got me all the way around there... performace is an acceptable reason to roll your own, and it only takes 5 minutes to spot that bug, not 5 days.


  • @trwww said:

    and it only takes 5 minutes to spot that bug, not 5 days.

    Especially considering this little peice I found just before where the finished SQL statement is returned:
       # $this->debug->out("Full SQL Statement:\n $SQL");
    ...which means that spotting the problem is one # away.

    Oh well, in all fairness git blame blames me for that #, so I have to take some of the blame for the whole thing

    1. Moar peer review plx
    2. I can has non-disabled debug output plx?
    3. Docmuthation?  What's that?
    The business-rules-into-SQL-parser IS documented, and in great detail, but it has some serious flaws:
    • It assumes you know what a "hash" is
    • It assumes you know where the documentation is (The documentation's location is in a comment at the top of every .pm)
    • It assumes you can read, and understand what you read
    But no, in all seriousness, this is not typical of him, so his job is safe for now.  He just got promoted to "canary", however...


  • @Chewbacca said:

    Anyone familiar with Perl will know that a class, or an object, or whatever you might call it, is essentially a glorified hashref, or "a pointer to an associative array".

    False.  A perl object *can be* a glorified hash ref.  It can also be a glorified array ref, a glorified scalar ref, a glorified code ref, or a glorified handle ref..  The language just requires that it be a ref of some kind, as only refs can be glorified.  (At least, if by 'glorified', you mean 'blessed'.)

    In the past few years, someone had the bright idea to tie this flexibility together with lexical variables.  You have your constructor generate a scalar ref (as scalars take up less memory than all of the other things we can reference), and then make the real object in one or more lexical hashes, using the address of the scalar as the hash key.  Bless the scalar ref, and return it as your object.

    With this mechanism, the only way to access the guts of the object directly is to write code in the file that defines the object.  People who are merely using your class can't touch it.  The only thing they can illicitly access is the scalar value - and this scheme doesn't even use that.

    You pay a slight performance penalty, because all of your accesses have to be through accessor functions, rather than simply pulling data out of the hash.  However, it allows you to detect scenarios like the above - at least, so long as you don't allow them to pass in the rule structure as a nested structure like above.  Instead, you have your rule list passed in as an array of pairs to the initializer, and if it sees a key that has already been defined, it can warn of the problem.  You may also get a very slight memory benefit - scalars take up less space than hashes, and under this scheme, you get a constant number of hashes, rather than one hash per object.

    Scalar::Util is helpful for this, but not absolutely required.  Same for Class::Std.  (Note: while the technique works all the way back to perl 5.6.1, Class::Std has a dependency that requires perl 5.8.0 or newer.  Class::Std only gives you easier constructors/destructors - while it's nice to write less code, if you're stuck before 5.8.x, it's probably easier to write a little more code than to get everything ported to a newer perl.  I mean, if you were going to upgrade perl at some point, you'd have done it already, right?)

    There is, of course, some legibility danger here - using the 'inside-out' objects described above, it is possible to make a coordinate object that has x,y, and z values, and then subclass it, and have the subclass have a different set of x, y, and z values, while also having the first set.  One could repeat this idiocy any number of times.  And, yes, this is a story I'm not able to go into here just yet.  At this point, I'm just wanting to advise: don't do that.  Overloading the member names of an inside-out object is just asking for problems.



  • @tgape said:

    False.

    $post =~ s/essentia/usua/g; # Fixed!



  • Re: Big surprise for you now, presents for your Christmas, LV, Gucci, Prada, Chanel, fantastic items.

    @angelchen said:

    Hello friends, <spam/>

    I agree, that is TRWTF!



  • TRWTF is that you're making it so hard to change the business logic as to require your customers to bring in Perl experience to code their own.  Honestly, it sounds like a deployment and maintenance nightmare, which will grow exponentially with the number of customers you have. 


  • Discourse touched me in a no-no place

    @Cat said:

    Honestly, it sounds like a deployment and maintenance nightmare, which will grow
    exponentially with the number of customers you have. 
    If the number of customers never goes above 1 it's a non issue. It's like complaining about the inadequacies of using bubble sort for 10 items when you might, but won't, sort 1000's.



  • @Cat said:

    TRWTF is that you're making it so hard to change the business logic as to require your customers to bring in Perl experience to code their own.  Honestly, it sounds like a deployment and maintenance nightmare, which will grow exponentially with the number of customers you have. 

    Well, we don't sell the software. We sell a service, and use this software internally to help provide that service.
    This service currently has a single customer, but this customer is so incredibly huge that it warrants an entire in-house software suite to keep them happy.

    If the needs change, so will the software.
    For example, the "rules parser" could be hooked up to parse the rules into some elaborate database table layout, and we could write a nice front-end to manage that.
    For now, however, it's far more cost-effective to just use vim.

    Also, if the software was for customer eyes, I certainly would have sacrificed the near-instant performance of "pre-compiled configuration" for actual usability by the customer.
    I mean, I wouldn't dream of making it this way in any of the open source stuff, for example.



  • @PJH said:

    @Cat said:
    Honestly, it sounds like a deployment and maintenance nightmare, which will grow exponentially with the number of customers you have. 
    If the number of customers never goes above 1 it's a non issue. It's like complaining about the inadequacies of using bubble sort for 10 items when you might, but won't, sort 1000's.
     

     True, but it already HAS grown above 1.  And just in general, you should always plan ahead to be a little more flexible than you think you need -- I can't count the number of things my company has had to change -- often with great difficulty -- because a scenario that in the past we didn't anticipate would ever happen now HAS happened and we have to now deal with it.

    Now, of course there are limits; if you're anticipating 1-5 customers, planning for 5 million is overkill, but at least laying the groundwork to be able to support 50 is prudent.


Log in to reply