Deleting old log entries the wrong way, done badly



  • Some code I have the misfortune to occasionally have to read in order to work out why some new and spectacularly stupid behaviour is occuring has an... unconventional... way of keeping the size of its log database under control.

    This particular log records events that are generated as a consequence of external influences (X switched on, Y switched off, Z requested a reset etc). They're stored in a MySQL database (sledgehammer/nut), so it should be easy for the user to ask questions such as "tell me that last time Y restarted" or "how many times has Z restarted in the last 48 hours". It isn't, but that's far from the worst aspect.

    Due to a certain lack of developer confidence, the details of an event that is sent to the database is also written to a flat file, in a more verbose form. When it reaches the database it is written to another flat file. When an event is deleted (we'll come to this in a moment) that fact is also written to a flat file.

    Assuming (hoping!) that horror will go away someday, that still leaves the usage of the database itself. To keep the database size under control, somebody has decided that only the most recent 400 entries should be preserved. This, on disk with ~4GB of space which is recording in the order of low hundreds of events of maybe 64 bytes each, per day. Since the deletion is itself logged, saving ~64 bytes of database consumes ~64 bytes of log file. Genius.

    At some reasonable interval (probably daily), I might have chosen to run a selective delete query to remove all entries over 7 days old (by that age, the information has zero value). Unfortunately, that's not the "solution" that's been implemented. Prior to every insert, this call is made:

            // trim the table size to <1000 entries if needed
            deleteOldEvents();
           
    Despite the comment, the magic threshold is 400, but they way it's done is not what you might expect...

        Query query = db->query();

        query << "select * from events order by thedate";
        StoreQueryResult  result = query.store();

        if (result.num_rows() < MAX_ALLOWED_ROWS) {
            // We still have room to spare
            return;
        }

        // too many rows in the DB -- delete some
        try
        {
            StoreQueryResult::iterator i = result.begin();
            Row row = i;

            int dateTime = atoi(row["thedate"]);

            char
    tPtr = ctime((const time_t *)&dateTime);
            char eventTime[32];

            (void) memset(eventTime, 0, sizeof (eventTime));

            (void) strncpy(eventTime, dPtr, strlen(tPtr) - 1);

            LogThis("Deleting all entries logged before %s", dTime);

            query << "Delete from events where thedate="" <<
                atoi(row["thedate"]) <<""";

            query.execute();
        }
        catch(Exception& e)
        {
            LogThis("%s", e.what());
            // Carry on regardless
        }   
     
    I'm an SQL novice and would never even have contemplated that that might be the right approach. It's the wrong solution, implemented badly. The wrong solution implemented less badly would have been less offensive. Even this would have been an improvement:

    "delete * from events order by thedate limit 1"

       
    There are many more horrors contained within this code, but I can only cope with brief exposures to it. One recurring pattern that occurs on consecutive lines is this:

    #define SOME_VALUE 123

    const unsigned int mySomeValue =  SOME_VALUE;

    someFunction( mySomeValue, someOtherParam );

    Groan...



  • I can't think of a snappy name for it, but I've run into the antipattern of "databases are a crappy filesystem that someone forced me to use for this project" before. The antipattern shows up on the sidebar pretty frequently as well.

    It's kind of the opposite of the "if you have a hammer, everything looks like a nail" situation, and more like "I'm using my toolbox full of shiny new tools as a stool to sit upon while I smash things with a rock". Way to wordy to be a cool antipattern name...



  • @RichP said:

    "I'm using my toolbox full of shiny new tools as a stool to sit upon while I shit all over things"
     

    FTFY



  • @RichP said:

    I can't think of a snappy name for it, but I've run into the antipattern of "databases are a crappy filesystem that someone forced me to use for this project" before. The antipattern shows up on the sidebar pretty frequently as well.

    There's a large percentage of developers who are scared shitless of databases. No, I don't know where they come from. Or why they're afraid. But they're out there.



  • @RichP said:

    I can't think of a snappy name for it, but I've run into the antipattern of "databases are a crappy filesystem that someone forced me to use for this project" before. The antipattern shows up on the sidebar pretty frequently as well.

    It's kind of the opposite of the "if you have a hammer, everything looks like a nail" situation, and more like "I'm using my toolbox full of shiny new tools as a stool to sit upon while I smash things with a rock". Way to wordy to be a cool antipattern name...

    "Stoolbox". Fits both your and dhromed's descriptions.

     



  • @blakeyrat said:

    @RichP said:
    I can't think of a snappy name for it, but I've run into the antipattern of "databases are a crappy filesystem that someone forced me to use for this project" before. The antipattern shows up on the sidebar pretty frequently as well.

    There's a large percentage of developers who are scared shitless of databases. No, I don't know where they come from. Or why they're afraid. But they're out there.

     

    I suspect many of these are people with Computer Science degrees who've been taught everything about relational algebra, the theoretical side of database structure, etc., but nothing about what the hell databases are for. A lot of universities' CS programs, at least here in the U.S., don't teach the practical side of databases very well. Or the practical side of anything very well, in some cases.



  • @blakeyrat said:

    There's a large percentage of developers who are scared shitless of databases. No, I don't know where they come from. Or why they're afraid. But they're out there.
    I think it's because a lot of developers are only able to conceptualize solutions in a single programming language, probably the first or second language they learned.  They know The Best Way to solve the problem using their favorite language, so why should [i]they[/i] have to learn relational theory or SQL?

    Even more disturbing are the scared-shitless developers who work with databases for a while and [i]remain[/i] scared shitless.  Don't you think after doing it for a while, they'd get more comfortable with it?  Shouldn't they start to understand the benefits of using databases over their crappy flat file solutions?  Or maybe they just don't have time for that, what with all the complaining that needs to get done.



  • @boog said:

    @blakeyrat said:

    There's a large percentage of developers who are scared shitless of databases. No, I don't know where they come from. Or why they're afraid. But they're out there.
    I think it's because a lot of developers are only able to conceptualize solutions in a single programming language, probably the first or second language they learned.  They know The Best Way to solve the problem using their favorite language, so why should they have to learn relational theory or SQL?

    Even more disturbing are the scared-shitless developers who work with databases for a while and remain scared shitless.  Don't you think after doing it for a while, they'd get more comfortable with it?  Shouldn't they start to understand the benefits of using databases over their crappy flat file solutions?  Or maybe they just don't have time for that, what with all the complaining that needs to get done.

    I'm still scared of horses



  • @Someone You Know said:

    I suspect many of these are people with Computer Science degrees who've been taught everything about relational algebra, the theoretical side of database structure, etc., but nothing about what the hell databases are for. A lot of universities' CS programs, at least here in the U.S., don't teach the practical side of databases very well. Or the practical side of anything very well, in some cases.
    I hated my databases class.  I had trouble following the class, the instructor was lousy, and I generally disliked the subject matter.  I agree it was probably because we didn't have any practical context in which to discuss anything, or to complete our assignments (which all involved a crappy-tab-delimited-file-with-hundreds-yes-hundreds-of-typos-and-run-on-columns-and-other-stupid-shit-that-had-to-be-fixed-manually-in-a-text-editor-otherwise-it-would-make-your-program-to-load-it-all-into-the-database-choke containing most of the Animalia taxonomic kingdom).  That class was the only time I ever used databases while I was in school, and it was an entirely negative experience.

    Then again, my first job out of school: working with databases.  I picked it up instantly, with no problems and no complaints.

    So you'd think, once given a practical context, that the scared-shitless developers would start to get it.  But they don't.



  • @boog said:

    Even more disturbing are the scared-shitless developers who work with databases...


    Not just databases. Elsewhere in this same abomination was some XML parsing (possibly by the same author) which did something like this:


    (in no particular language)

        foreach node in allNodes

        {

            if (node.isInteresting)

            {

                foreach childNode in allNodes

                {

                    if (childNode.parent == node)

                    {

                        // Compute something

                    }

                    // else carry on looking checking all other nodes anyway

                }

            }

        }


    The XML was such that each node of interest only had one child, so once the child was found there was no need to consider any others. XPath queries and DOM navigation methods (other than asking a node for its parent) weren't familiar to the author. Fair enough, if it was their first foray into XML (but I don't think it was), but who in their right mind would think that checking the relationship between every permutation of node-pairs was the right way to go about it?



  • Wait a minute! Just the moment he became familiar with relational databases, you want to forbid him to select whatever from allNodes p, allNodes c where isInteresting(p) and c.parent = p?



  • @Someone You Know said:

    people with Computer Science degrees who've been taught everything about relational algebra

    You're joking, right? Or maybe just not British.



  • @Someone You Know said:

    I suspect many of these are people with Computer Science degrees who've been taught everything about relational algebra, the theoretical side of database structure, etc., but nothing about what the hell databases are for. A lot of universities' CS programs, at least here in the U.S., don't teach the practical side of databases very well. Or the practical side of anything very well, in some cases.

    Funny you say that, because I sometimes feel that I was gypped out of the theoretical knowledge in my studies! Of course, I wasn't technically in CS; I was in a newfangled IT program (apparently one of the first IT programs to be certified by some accreditation agency or other - I think it was ABET or something?). So now I can think in C# and SQL, and to a lesser extent in VB and Java... but anything else (C++? Python? PHP?) just looks like gibberish - and practically everything I learned was obsolete within a year after I learned it; if I wanted to stay current on anything, I'd pretty much have to use it constantly just to not lose ground. So here I am with 2 or 3 years of professional experience, and I feel like I know even less than I did when I graduated...



  • @blakeyrat said:

    @RichP said:
    I can't think of a snappy name for it, but I've run into the antipattern of "databases are a crappy filesystem that someone forced me to use for this project" before. The antipattern shows up on the sidebar pretty frequently as well.

    There's a large percentage of developers who are scared shitless of databases. No, I don't know where they come from. Or why they're afraid. But they're out there.

    I think I prefer that to devs that build entire application platforms in a database. Which are devs I'm surrounded with. What? Java? (IE. a platform actually designed to build application frameworks, replace with .NET if you like). You can't do anything with that,you need to write a big pile of packages and procedures, plus the manual on how to maintain and deploy all that stuff. I'm so tired of that attitude that I take every opportunity to prove them wrong.

    The last occasion was glorious. People were harping about some file which was UTF-16 and for some reason when loaded through oracle sqlloader the database was filled with empty columns. Conclusion: the file needed to be UTF-8. After 15 minutes of discussion about how to use the Oracle DBMS to fix that with no solid results I had enough. I told them "I'll fix that file". It took me about 10 lines of Java code, written in 3 minutes (I'm too neat, I added proper error handling) to convert the file from UTF-16 to UTF-8 (inputstreamreader -> printstream). "Wow! What tool did you use?" they asked. "Java" I simply answered. Coding that stuff was quicker than doing a Google search for some prebuilt tool that could also do it :/

    Really, I wish I could frame those epic blank stares I got on my wall.

    15 minutes later I had created a user friendly Swing GUI around it which allowed you to choose the source and target charsets and then let you create a new file with the click of a button (nicely threaded, no GUI locking up), which I promptly added to the pile of other useful service tools I have written over the last few months to make my life a lot easier and could also be used by the people who are not tech savvy. Even longer blank stares.

     

    Aaaah, programming can be good.



  • @erikal said:

    I had created a user friendly Swing GUI
    Quick, hide before Blakey sees that!



  • @erikal said:

    Java?
    @erikal said:
    Oracle DBMS

    So you work in a place that uses Java and Oracle, and you're surprised they don't know shit? I mean duh. They PAID MONEY FOR ORACLE. Of course they're idiots.

    You're not some kind of superhero, you're just the average person competing in the Special Olympics.

    @ender said:

    Quick, hide before Blakey sees that!

    Why?



  • @blakeyrat said:

    @erikal said:
    Java?
    @erikal said:
    Oracle DBMS

    So you work in a place that uses Java and Oracle, and you're surprised they don't know shit? I mean duh. They PAID MONEY FOR ORACLE. Of course they're idiots.

    You're not some kind of superhero, you're just the average person competing in the Special Olympics.

    @ender said:

    Quick, hide before Blakey sees that!

    Why?

    I think he's afraid the mere mention of Java will prompt a Blakeyrant™. I'm having difficulty understanding why this is a bad thing.

    And paying money for Oracle doesn't hold a candle to paying for Sybase. A company that released a "production quality" 64-bit ADO.NET driver that failed the first "SELECT" unit test we ran against it. Why? Because the driver only returns a single row regardless of how many rows the actual SQL statement returns.



  • @blakeyrat said:

    You're not some kind of superhero, you're just the average person competing in the Special Olympics.

    Yeah, this is exactly why I tend to be nervous about databases.  Because I know I'm not a superhero.  It's not the database itself that has me nervous.  It's all of the misuse of databases I've had to deal with over the years.  Not all of that has been Java and Oracle, mind you.  But it's not the tools, it's the people who misuse and/or mischoose the tools.



  • @tgape said:

    But it's not the tools,

    No, it is the tools. Oracle doesn't follow the SQL spec (empty string == NULL) in a way that makes using it incredible awkward and error-prone. It's like every copy comes with a free minefield of bugs.

    Unless you're excusing the tools by saying only the idiot who bought it is at fault, which I think is unfair-- usually the high pressure sales people at places like IBM or Oracle go directly to the C*O level with a generous budget for bribes, and the poor engineer is stuck with Lotus Notes or Oracle DB through no fault of their own.



  • @blakeyrat said:

    No, it is the tools. Oracle doesn't follow the SQL spec ...

    Just curious — what would be your database engine of choice, and why?



  • @blakeyrat said:

    You're not some kind of superhero, you're just the average person competing in the Special Olympics.
     

    Thank you for describing exactly how I have been feeling for a long time now without being able to put it in such fitting words :)

    Now if only I could explain it to certain people in a similar way that does not include calling my dear and beloved coworkers less abled ;)



  • @Daniel Beardsmore said:

    @blakeyrat said:
    No, it is the tools. Oracle doesn't follow the SQL spec ...

    Just curious — what would be your database engine of choice, and why?

    Well it depends on budget, but if money is no object, Microsoft SQL Server is the only database engine that has an even slightly decent management tool and a mostly sane design. It's also cheaper than the IBM/Oracle competition (in most cases), easier to install, and has a simple accessible "lite" version you can use for development without paying a cent. (And the nice management software works on the "lite" version too.) And it has the same set of features, same resource usage, same reliability as any of the competition. And the data types make sense. (Although the NVARCHAR(MAX) thing is admittedly kind of weird.)

    By my set of priorities, it's head and shoulders above the rest.

    If you need a DB that can be embedded or otherwise travel with the app, SQLite is an amazing product considering how tiny and portable it is.



  • @erikal said:

    Thank you for describing exactly how I have been feeling for a long time now without being able to put it in such fitting words :)

    It's a little more extreme than how I might put it, but I experience similar feelings.

    @blakeyrat said:

    Well it depends on budget, but if money is no object, Microsoft SQL Server is the only database engine that has an even slightly decent management tool and a mostly sane design...

    Are there any open source ones that you think don't suck? I've heard good things said of PostgreSQL, but I've never experienced it. Poor old me, I mostly use MySQL. Thus far, not to any extent that it has ever posed a problem – maybe in the future I will end up learning the hard way, but in the meantime it seems fine. This is a strong indication that I suck.

    @blakeyrat said:

    If you need a DB that can be embedded or otherwise travel with the app, SQLite is an amazing product considering how tiny and portable it is.

    Maybe this is just me, but I like data created by applications on my PC to always be in a format that I can edit in a text editor and check what's going on, fix things that break etc. XML is probably best (for a lack of a saner format — INI files are more readable but don't support structure or namespacing).

    Then again, I am considering getting myself one of these to put on the wall above my desk… (I haven't read Lovecraft, but I couldn't find any alternatives)



  • @Daniel Beardsmore said:

    Are there any open source ones that you think don't suck?

    I'd be lying if I said I had a lot of experience in them. I've used MS SQL Server a lot, Oracle for a few weeks, and DB2 extremely briefly. Oh and some AsterData but that barely counts.

    I rant about MySQL because, 5-7 years ago, it was an extraordinarily shitty product and I could not for the life of me figure out why the hell tons of web hosts installed it as default and tons of web apps expected it to be present. Thing is, my MySQL experience is probably out-of-date now, too-- it's evolved a decent amount, and frankly it's probably a fine tool now. Possibly. Maybe. (That said, I'll never forgive a DBMS that simply crops your oversized strings instead of returning a warning or an error. Or utterly fails to return data in the same character set it was stored in. It really was a gigantic piece of shit.)

    That said, MySQL's best management tool is a pretty bad web app. Which is still tons better than anything Oracle's come up with, but man, it stinks on toast.

    @Daniel Beardsmore said:

    Maybe this is just me, but I like data created by applications on my PC to always be in a format that I can edit in a text editor and check what's going on, fix things that break etc.

    Yeah, well, you're one of the "tech luddites" on this board, so that doesn't surprise me at all. Anyway, that's basically what SQLite is... except replace "text editor" with SQLite query tool.

    @Daniel Beardsmore said:

    XML is probably best

    XML is a data interchange tool. It's also appropriate for storing configuration information (only if it's read-only, though-- see link below). If you don't need any of the advantages a database offers, then... sure go ahead and use XML. But if your application needs to retrieve a large amount of data quickly, and you're using XML for the task, well. You're being an idiot. Because XML sucks at that. (See the Joel on Software link further up on this thread, which actually talks about why XML is a shitty format if you need random access of data.)

    @Daniel Beardsmore said:

    (for a lack of a saner format — INI files are more readable but don't support structure or namespacing).

    INI files are terrible. If you see an app in 2011 using an INI file, run. That's a major WTF warning sign.

    HEY LOOK! Here's Raymond Chen's article on why INI and XML files suck for storing read/write configuration information for apps! I know I link this all the time, but since there's so many tech luddites who are still crowing about how INI files are a pretty good idea, I guess the concept needs hammered in some more.

    INI. Files. Suck. For. Their. Intended. Purpose. XML files suck mostly in the same ways.

    The fact that Windows uses a Registry (that is, a database) to store configuration information makes it superior to operating systems that do not. Because the problems listed in Raymond Chen's articles? They apply just as much to any other multi-user operating system-- OS X and Linux just put their hands over their ears and shout "LA LA LA!!!" and ignore the issues.



  • @blakeyrat said:

    That said, I'll never forgive a DBMS that simply crops your oversized strings instead of returning a warning or an error.

    I think it still does that? My beef with MySQL is that you're only able to get back errors as strings, instead of error numbers, so proper error handling is … difficult. (Error handling though sucks everywhere: you must be able to retrieve at a minimum, the numeric code for conditionals, and a string that names the offending object and/or action, e.g. "File 'C:\foo' not found" (some API calls make it obvious a file was intended) instead of "The system cannot find the path specified" (which path???), and "ACL permissions prohibit reversing the polarity of the neutron flow" instead of "Access denied" (by what, to what????? kill me now!!)

    @blakeyrat said:

    Or utterly fails to return data in the same character set it was stored in.

    The advantage of any new technology should be UTF-8 from the ground up. I work in UTF-8 anywhere that I can, and I've set my PC up to be able to type Unicode characters in directly (e.g. "→"). Anything where UTF-8 is an afterthought (e.g. Perl core and the DBI and CGI) is going to wind up a nightmare.

    Yeah I wouldn't listen to myself any more either.

    @blakeyrat said:

    Yeah, well, you're one of the "tech luddites" on this board, so that doesn't surprise me at all.

    When I got my StarMax back in 2000, the previous owner used "Lud Inc." as the company name for software registrations. That tickled me, and I've stuck with it ever since. (Even when he couldn't spell "Ludd").



  • @Daniel Beardsmore said:

    I think it still does that?

    My understanding is that MySQL "fixed" the "our database engine sucks ass" problem by adding a completely different "data storage engine" that doesn't suck ass-- so you can actually put MySQL in a mode where it's still using the shitty engine, or you can switch it to use the non-shitty engine. Back when I was working with it daily, it only had the shitty engine, so... I dunno.

    The only way I use it now is indirectly through WordPress, and WordPress seems to be doing ok (since I fixed its character encoding. Which has to match the DB collation. But despite WP creating the database itself, it doesn't just fucking do that automatically on first-run, meaning every WP user on Earth has to suffer through at least one backup/restore with bad characters they have to manually fix before knowing to fix the problem for good.) Hopefully WordPress is using the non-shitty MySQL engine, but I don't know.

    But hey, open source right!

    @Daniel Beardsmore said:

    Anything where UTF-8 is an afterthought (e.g. Perl core and the DBI and CGI) is going to wind up a nightmare.

    Yup. Although I'm pretty sure everything is an afterthought in Perl.



  • @blakeyrat said:

    My understanding is that MySQL "fixed" the "our database engine sucks ass" problem by adding a completely different "data storage engine" that doesn't suck ass-- so you can actually put MySQL in a mode where it's still using the shitty engine, or you can switch it to use the non-shitty engine. Back when I was working with it daily, it only had the shitty engine, so... I dunno.

    Actually, the shitty engine was default until, like, late last year. CREATE TABLE without explicit ENGINE= clause still gave you a MyISAM until... 5.5? 6.0? I don't even know.
    @blakeyrat said:

    The only way I use it now is indirectly through WordPress, and WordPress seems to be doing ok (since I fixed its character encoding. Which has to match the DB collation. But despite WP creating the database itself, it doesn't just fucking do that automatically on first-run, meaning every WP user on Earth has to suffer through at least one backup/restore with bad characters they have to manually fix before knowing to fix the problem for good.) Hopefully WordPress is using the non-shitty MySQL engine, but I don't know.


    Easy to check, I guess. Most "mass popular" web apps do assume and WANT MyISAM, tho. Also, many shared web hosts keep running it because it's "more lightweight" or something.

    But then, Wordpress is a securityhole-ridden piece of crap. The best parts are how it hardcodes domain names in the DB and other places making it a pita to move. But note that its also so slow and heavyweight you need to install a "cache plugin" that renders your articles to plain HTML to make it run at acceptable speed.

    But hey, open source, right?


  • Discourse touched me in a no-no place

    @Daniel Beardsmore said:

    My beef with MySQL is that you're only able to get back errors as strings,
    instead of error numbers
    In which languages? I've had no problems getting (and dealing with) error numbers in C and PHP....



  • @PJH said:

    @Daniel Beardsmore said:
    My beef with MySQL is that you're only able to get back errors as strings,
    instead of error numbers
    In which languages? I've had no problems getting (and dealing with) error numbers in C and PHP....

    According to Google, I are idiot. Nothing new there, move along :°(

    (Working alone sucks, too.)



  • @blakeyrat said:

    My understanding is that MySQL "fixed" the "our database engine sucks ass" problem by adding a completely different "data storage engine" that doesn't suck ass-- so you can actually put MySQL in a mode where it's still using the shitty engine, or you can switch it to use the non-shitty engine. Back when I was working with it daily, it only had the shitty engine, so... I dunno.
    Correct me if I'm wrong, but last I checked (about a year or so ago) MySQL still shamelessly uses derived tables as a counterfeit form of inline views, regardless of the data storage engine.  Of all the other issues it has, there's really nothing like joining large derived tables with no indexes.

    So... I still avoid MySQL when I can.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.