Enough Thread to hang yourself



  • Back at WTFCSES, my first non-trivial, non-academic task was writing a Java batch. Several users were generating this particular Oracle Reports report by hand hundreds of times per person per day, so the logical solution was to pre-config the report's input flags, let the users bulk-select the IDs to run through the report, then execute the Reports command-line executable (through Runtime.exec()) overnight. The generated PDFs would get saved in EMC's Documentum, a platform I choose to say nothing else about.

    The technical lead was afraid the batch would take forever to run, so he and a senior developer designed it to follow the thread pool pattern. Each thread would get a bulk request, query the DB to get all ID-relevant information, then do the Runtime.exec. I made several mistakes, in terms of both shoddy code and not understanding WTFCSES's existing (if extremely limited) Java code base (again, first non-trivial, non-academic task), but I did hammer out something that--save a couple newbie logic errors like int instead of double and == instead of .equals()--met spec. So it was now time for my first code review.

    (Aside: WTFCSES's Java code base was pretty atrocious. For example I recall there was a homebrewed XML string parser, written around JDK 1.1 or 1.2 and never refactored, which read an XML file with exactly four fields in it: User name & password for Documentum in prod, and same for "rrr"/dev & QA regions)

    My technical lead insisted that I include "Paolo" as a reviewer, who, I was repeatedly told, was known on our project as "the Java Guy." Okay, I'll include Paolo. I reserve a conference room and we all sit down to go over the code. When we get to the class that implemented Runnable, Paolo stops me.

    P: You create a new Connection on every Thread.
    Me: Well, yeah. That's not a problem, is it?
    P: Yes it is. You'll put too much strain on the database.
    Me: Really? Is seven extra connections so bad?
    [OP: I wrote a shoddy imitation of thread pool where the threads didn't exit after completing work, but instead did this wait/notify thing with Main to request more work. Super kludgey, probably appropriate for a Confession thread.]
    P: Yes. You should create one Connection in the main Thread, then when creating your Runnable Threads pass it as an input parameter.
    Me: Are you sure? Won't the threads block each other when they query the database?
    P: No, you'll be fine. Just use the one Connection.

    Later on, I discovered that Paolo was hired literally two weeks before me, and got his "Java Guy" reputation off his resume alone. [OP: he was pretty good as A Java Guy, but probably not worthy of The Java Guy] And yes, the threads all blocked because they used one single Connection object. But only I figured that out during a code review months after it was built to production. To sum up, my first contribution to WTFCSES was the world's most sequential multithreaded program.

    The stinger: When it went to production, the batch took forever to run anyway, because the users were inputting dates in a format the system analyst explicitly assumed they wouldn't use. This caused every report to run its complex analytic computations over a 1900 year span.



  • @ArrivingRaptor said:

    EMC's Documentum

    I worked with one of the morons who worked on that. He was a moron.

    @ArrivingRaptor said:

    The stinger: When it went to production, the batch took forever to run anyway, because the users were inputting dates in a format the system analyst explicitly assumed they wouldn't use. This caused every report to run its complex analytic computations over a 1900 year span.

    Didn't it do any data validation? Wait, what am I asking, of course it didn't...



  • @morbiuswilters said:

    Didn't it do any data validation? Wait, what am I asking, of course it didn't...

    Well TECHNICALLY 010101 is a valid MMYYYY format.



  • @ArrivingRaptor said:

    To sum up, my first contribution to WTFCSES was the world's most sequential multithreaded program.

    Why was a threaded application proposed rather than a queue? The tasks are being run overnight so who cares if they're done sequentially?



  • @ArrivingRaptor said:

    P: Yes it is. You'll put too much strain on the database.
     

    Java Guy had a point. TRWTF here is that a technical lead and a senior designer were so keen to DoS the database. It's almost as if seniority within an organisation doesn't correlate with competence in any statistically significant way. Almost.




  • @GNU Pepper said:

    @ArrivingRaptor said:

    P: Yes it is. You'll put too much strain on the database.
     

    Java Guy had a point. TRWTF here is that a technical lead and a senior designer were so keen to DoS the database. It's almost as if seniority within an organisation doesn't correlate with competence in any statistically significant way. Almost.

    Seven connections are going to DoS a database? I don't recall him saying he was running Postgres..



  • @GNU Pepper said:

    @ArrivingRaptor said:

    P: Yes it is. You'll put too much strain on the database.
     

    Java Guy had a point. TRWTF here is that a technical lead and a senior designer were so keen to DoS the database. It's almost as if seniority within an organisation doesn't correlate with competence in any statistically significant way. Almost.



    Well sure, but the solution then isn't to pass the same connection everywhere, it's to use proper connection pooling (which no one ever suggested in this process).



  •  There is already thread pool, so why over-complicate things by using another pool?



  • Is seven threads really enough to hang yourself with? What kind of fucked-up database can't handle seven fucking connections? Why was the date input not a date selector?


  • Considered Harmful

    @Ben L. said:

    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?



  • @joe.edwards said:

    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.

  • Considered Harmful

    @Ben L. said:

    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.
    So can 2013-04-11.


  • @joe.edwards said:

    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.
    So can 2013-04-11.
    Mine is two bytes (20% of yours) less data to represent the same thing.


  • @Ben L. said:

    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.

    I spend about 1/3 of my day cursing the devs of one of the systems I manage because of this. I have one table which contains fields representing datetime stamps as YYYYMMDD HH🇲🇲ss, as well as MM/DD/YYYY HH🇲🇲ss, and occasionally MMDDYY HH🇲🇲ss. You know, instead of using the database's native DATETIME field type, let's use a varchar(20) field. Makes reporting exciting. This is probably one of the reasons that the vendor manages most other installs, so they can hide the horror of ONE GIANT TABLE and DATETIME FIELDS OF TEXT.


    Yes, unambiguous, and yes, asciibetical sort. But painful to do any sort of datediff computation.



  • @Ben L. said:

    @joe.edwards said:
    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.
    So can 2013-04-11.
    Mine is two bytes (20% of yours) less data to represent the same thing.

    Yes, but your format can be ambiguous.

    Certainly in your example no-one is going to mistake the 2013 as anything other than the year, because it cannot be DayMonth or MonthDay.

    But if it was 20120411 it could represent the December 20, 411.



    ISO 8601 is your friend.



  • @eViLegion said:

    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.
    So can 2013-04-11.
    Mine is two bytes (20% of yours) less data to represent the same thing.

    Yes, but your format can be ambiguous.

    Certainly in your example no-one is going to mistake the 2013 as anything other than the year, because it cannot be DayMonth or MonthDay.

    But if it was 20120411 it could represent the December 20, 411.



    ISO 8601 is your friend.


    ^Z999999999999999999999999999999999999999999999999999999990411000000000000000000001 represents one femtosecond after midnight on 11 April 99999999999999999999999999999999999999999999999999999999 completely unambiguously, and, might I add, in an easily sortable format. Let's see your ISO 8601 do THAT.



  • @eViLegion said:

    ...But if it was 20120411 it could represent the December 20, 411.



    ISO 8601 is your friend.

    20120411 is ISO 8601 compliant. You don't expect a standard created by committee to not have options, do you?



  • @spamcourt said:

     There is already thread pool, so why over-complicate things by using another pool?

    I initially tried using some of the fun thread pooling and BlockingQueue classes in java.util.concurrent. Then I built the jar to the development application server and everything blew up because the application server ran Java 1.4.

    @Ben L. said:

    Is seven threads really enough to hang yourself with? What kind of fucked-up database can't handle seven fucking connections? Why was the date input not a date selector?

    It could, but there wasn't much I could say. I wasn't The Java Guy. Paolo was.

    As for the date, the closest I can get to an explanation is the report in question was a financial-type report and always needed to run from the first of the month. The eventual hotfix was that the Oracle Form would check if the entered value was earlier than 1990, then pop an alert box telling the user, "Hey, I think you may have entered MMDDYY instead of MMYYYY" (or maybe it was YYYYMM, I don't remember for sure). Either way it was a completely insane date representation.



  • @ArrivingRaptor said:

    @spamcourt said:

     There is already thread pool, so why over-complicate things by using another pool?

    I initially tried using some of the fun thread pooling and BlockingQueue classes in java.util.concurrent. Then I built the jar to the development application server and everything blew up because the application server ran Java 1.4.

    @Ben L. said:

    Is seven threads really enough to hang yourself with? What kind of fucked-up database can't handle seven fucking connections? Why was the date input not a date selector?

    It could, but there wasn't much I could say. I wasn't The Java Guy. Paolo was.

    As for the date, the closest I can get to an explanation is the report in question was a financial-type report and always needed to run from the first of the month. The eventual hotfix was that the Oracle Form would check if the entered value was earlier than 1990, then pop an alert box telling the user, "Hey, I think you may have entered MMDDYY instead of MMYYYY" (or maybe it was YYYYMM, I don't remember for sure). Either way it was a completely insane date representation.

    I've been dealing with something like that recently. I was able to obtain a good result by making a few assumptions, e.g. the user probably didn't mean month 12 of year 01, they probably just reversed the month and year.



    Regarding the OP's narrative: I think a lot of people who've crafted multi-threaded programs might be surprised by the actual lack of real parallelism in their runtime result. That's a weakness of threading as a model of parallel computation : so much synchronization code ends up being necessary, and it's hard to find an optimal implementation. Hell, it's hard enough in many cases just to get everything working right.



  • @tweek said:

    ONE GIANT TABLE and DATETIME FIELDS OF TEXT
     

     

    On a clear field of text, you can see {d 'forever'}




  • @bridget99 said:

    assumptions

    NO.



  • @Ben L. said:

    @bridget99 said:
    assumptions

    NO.

    It depends on the context. People don't like pop-ups. If I know the data doesn't include 2001, I'm not going to waste anyone's time confirming that "01" was actually a month. At least, this is the strategy I take for relatively unimportant things like financial reports.



  • @Watermelon said:

    @Ben L. blakeyrat said:
    @bridget99 said:
    assumptions

    NO.



  • @Ben L. said:

    @joe.edwards said:
    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.
    So can 2013-04-11.
    Mine is two bytes (20% of yours) less data to represent the same thing.

    Dear God I hope you are joking.



  • @morbiuswilters said:

    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    @joe.edwards said:
    @Ben L. said:
    Why was the date input not a date selector?

    Who would enter a date with no unit delimiters? 010101, seriously?

    20130411 is pretty unambiguous and can be sorted ASCIIbetically.
    So can 2013-04-11.
    Mine is two bytes (20% of yours) less data to represent the same thing.

    Dear God I hope you are joking.

    RFC2550, maaaaaaaaaaaaaaaaaan.



  • @bridget99 said:

    Regarding the OP's narrative: I think a lot of people who've crafted multi-threaded programs might be surprised by the actual lack of real parallelism in their runtime result. That's a weakness of threading as a model of parallel computation : so much synchronization code ends up being necessary, and it's hard to find an optimal implementation. Hell, it's hard enough in many cases just to get everything working right.

    [citation needed]


    Come on, I know it's not uncommon for newbies to screw up threading and make it essentially sequential with too much synchronization, but to claim the entire model is flawed is ridiculous. There is so much mutli-threaded software out there that will beat the CPU like a red-headed stepchild that your position is indefensible.



  • @morbiuswilters said:

    @bridget99 said:
    Regarding the OP's narrative: I think a lot of people who've crafted multi-threaded programs might be surprised by the actual lack of real parallelism in their runtime result. That's a weakness of threading as a model of parallel computation : so much synchronization code ends up being necessary, and it's hard to find an optimal implementation. Hell, it's hard enough in many cases just to get everything working right.

    [citation needed]


    Come on, I know it's not uncommon for newbies to screw up threading and make it essentially sequential with too much synchronization, but to claim the entire model is flawed is ridiculous. There is so much mutli-threaded software out there that will beat the CPU like a red-headed stepchild that your position is indefensible.

    I don't doubt that people can and do get threading right. I have my suspicions about a lot of the code written by enthusiastic recent grads who embraced threading because it seemed "cool" or "leet." This is not an uncommon phenomenon in my experience. For a formal take-down of the thread model, I'd recommend reading one of several papers on the topic by Edward Lee (professor at Cal). I think one of them is titled "threads are evil."


  • Considered Harmful

    My favorite was a MD5 brute forcer that not only used all of my CPU cores, but all of my GPU cores as well.

    It was _fast_, but it also brought everything else to a grinding halt and made the various fans on my PC whir like jet engines.



  • @ArrivingRaptor said:

    The eventual hotfix was that the Oracle Form would check if the entered value was earlier than 1990, then pop an alert box telling the user, "Hey, I think you may have entered MMDDYY instead of MMYYYY" (or maybe it was YYYYMM, I don't remember for sure).
     

    Don't Oracle Forms feature a date picker?



  • @Cassidy said:

    @ArrivingRaptor said:

    The eventual hotfix was that the Oracle Form would check if the entered value was earlier than 1990, then pop an alert box telling the user, "Hey, I think you may have entered MMDDYY instead of MMYYYY" (or maybe it was YYYYMM, I don't remember for sure).
     

    Don't Oracle Forms feature a date picker?

    Yep. That's what makes the design decision so insane. When I left four years later it was still in that format, to my knowledge.

    Now that I think of it, I believe all the forms from that functional area used that six digit date format, stored in the DB as NUMBER(6). It's conceivable that this was a design antipattern arising from a developer years ago who never learned about TRUNC('MONTH'). Or a performance DBA thinking date manipulation would be "too slow."



  • @bridget99 said:

    I don't doubt that people can and do get threading right. I have my suspicions about a lot of the code written by enthusiastic recent grads who embraced threading because it seemed "cool" or "leet." This is not an uncommon phenomenon in my experience. For a formal take-down of the thread model, I'd recommend reading one of several papers on the topic by Edward Lee (professor at Cal). I think one of them is titled "threads are evil."

    Threading isn't hard. Write good software is hard, period. Threading is just far less forgiving than what most shitty programmers are used to.



  • @joe.edwards said:

    My favorite was a MD5 brute forcer that not only used all of my CPU cores, but all of my GPU cores as well.

    It was _fast_, but it also brought everything else to a grinding halt and made the various fans on my PC whir like jet engines.

     

    Shouldn't an MD5breaker in this day and age that uses every core on a machine finish in seconds?

     


  • Considered Harmful

    @dhromed said:

    @joe.edwards said:

    My favorite was a MD5 brute forcer that not only used all of my CPU cores, but all of my GPU cores as well.

    It was _fast_, but it also brought everything else to a grinding halt and made the various fans on my PC whir like jet engines.

     

    Shouldn't an MD5breaker in this day and age that uses every core on a machine finish in seconds?

     

    It really depends on the length and character set of the plaintext. It found most user passwords instantly but a few were extra long or used weird characters.



  • @joe.edwards said:

    It found most user passwords instantly
     

    That is awesome.


  • Discourse touched me in a no-no place

    @dhromed said:

    Shouldn't an MD5breaker in this day and age that uses every core on a machine finish in seconds?
    Not really. If you have a hash for known blob X, and you want the same match for known blob Y+{extraneous stuff to get the MD5 to match} then you have to vary the extraneous stuff to get the match, it'll take a while. Unless you can figure out from a few attempts (or by reverse engineering MD5) what extraneous should be. Otherwise it's an brute force search.



    I vaguely recall (details may be wrong) someone who did exactly that by posting (only) the MD5 of a document(pdf?) stating that that it contained the result of an election yet to be held. After the election they posted the pdf which did match the MD5. This seems to match my recollection.



  • @PJH said:

    @dhromed said:
    Shouldn't an MD5breaker in this day and age that uses every core on a machine finish in seconds?
    Not really. If you have a hash for known blob X, and you want the same match for known blob Y+{extraneous stuff to get the MD5 to match} then you have to vary the extraneous stuff to get the match, it'll take a while. Unless you can figure out from a few attempts (or by reverse engineering MD5) what extraneous should be. Otherwise it's an brute force search.



    I vaguely recall (details may be wrong) someone who did exactly that by posting (only) the MD5 of a document(pdf?) stating that that it contained the result of an election yet to be held. After the election they posted the pdf which did match the MD5. This seems to match my recollection.

    Collision attacks on MD5 still take some time, but I think they were just referring to brute-forcing a password hash. A modern CPU/GPU combo should be able to hit around 1 billion hashes /sec, which is quite sufficient to run through large dictionaries with substitutions and numeric suffixes in a matter of seconds.


  • Considered Harmful

    Yeah I was able to extract basically a snapshot of their entire database through SQL injection, and the hashes weren't even salted.

    The fun part was seeing the most popular passwords, I sorted them by frequency. Oh, and privileged accounts by and far did not have more complicated passwords than non-privileged accounts, sometimes quite the opposite. A memorable one was a full administrator using muppets.



  • @joe.edwards said:

    A memorable one was a full administrator using muppets.

    He should be careful about using Muppets. That's what got Wander McMooch into trouble.



  • @bridget99 said:

    Regarding the OP's narrative: I think a lot of people who've crafted multi-threaded programs might be surprised by the actual lack of real parallelism in their runtime result. That's a weakness of threading as a model of parallel computation : so much synchronization code ends up being necessary, and it's hard to find an optimal implementation. Hell, it's hard enough in many cases just to get everything working right.

    Obviously a JAVA programmer and not a .Net programmer.

    Threads have gotten better in every version of .Net... first, we had connection pooling built in from 1.0.  Then they added more threading capabilites in 2.0, 3.5 SP1, etc... 

    Now writing multi-threaded code is too easy.  You can overparallel your code and reduce prerformance.  I love being able to write parallel for each loops, spin a set of tasks off in parallel, or making a LINQ query against objects run on multiple threads by simply adding .AsParallel() to it.





  • @Ben L. said:

    Example of why threading can be easy with the right tools

    Considering there is access to shared resources, this is hardly proof of anything. The same code in Java would hardly look different.



  • @morbiuswilters said:

    @Ben L. said:
    Example of why threading can be easy with the right tools

    Considering there is access to shared resources, this is hardly proof of anything. The same code in Java would hardly look different.

    Did you try running it?



  • @Ben L. said:

    @morbiuswilters said:
    @Ben L. said:
    Example of why threading can be easy with the right tools

    Considering there is access to shared resources, this is hardly proof of anything. The same code in Java would hardly look different.

    Did you try running it?

    Yes, it prints strings, out-of-order. I'm not sure why you consider this an example of "why threading can be easy with the right tools", as it looks pretty much like the code in Java or C would look. It's so trivial it doesn't even delve into the actually thorny parts of threading, unless you consider out-of-order printing to be a threading issue. But in that case, the example is so arbitrary and silly, I'm not sure what it's even supposed to show.


  • Discourse touched me in a no-no place

    @morbiuswilters said:

    but I think they were just referring to brute-forcing a password hash.
    If all that's protecting a password is a single round of MD5, then there really isn't much more to be said about the matter...


  • Trolleybus Mechanic

    @morbiuswilters said:

    out-of-order
     

    Just an amusing sidenote. I know that we all know that CS is TRWTF in everyway, including how it sounds out thread reply notification emails-- but I still found this amusing given the topic at hand. Keep in mind that this should run in posting order, top to bottom:

    [IMG]http://i.imgur.com/QMUlOtm.png[/IMG]



  • Why would you expect emails to come in a particular order? There's nothing in the spec to guarantee that.

    Lorne, you are posting some unfunny shit today. Go stand in the corner and watch MST3K for 2 hours until you get your sense of humor back.



  • @Lorne Kates said:

    Keep in mind that this should run in posting order, top to bottom:
     

    The newest email containing the newest post is at the top, as it should.

    What is the problem?



  • @blakeyrat said:

    watch MST3K for 2 hours until you get your sense of humor back.
     

    He's supposed to get it back, you twit.


  • Trolleybus Mechanic

    @dhromed said:

    @Lorne Kates said:

    Keep in mind that this should run in posting order, top to bottom:
     

    The newest email containing the newest post is at the top, as it should.

    What is the problem?

     

    The oldest email should be on top, so you could read the conversation from start to finish (rather than finish to start).

    I'd expect it to arrive in the same order as the forum posts. I assume (possibly incorrectly) that CS's code is like this:


    OnPostButtonClick()
    {
        WriteShitToDB()
        SendEmailToThreadSubscribers()
    }

    So Morbs posts. It writes his shit to DB. It sends an email to all subscribers.

    Then Ben replies to Morbs post. It whites his shit to DB. It sends an email to all subscribers.

    Even if there's a delay between WriteShit and SendEmail (say from lots of emails to send), there should be an equal delay between the second WriteShit and SendEmail. It should roughly balance out.

    Again, yes, I know that everything in this entire process is run asynchronously with no guarentee of order of delivery. That's why I said it was amusing (given the context of a thread complaining about stuff being performed out of order).

    Y'know what? Fuck all y'all. Here's a random XCKD comic to punish you. Have fun replacing it:

    ... oh for fuck's sake!

     

     



  • @Lorne Kates said:

    I'd expect it to arrive in the same order as the forum posts. I assume (possibly incorrectly) that CS's code is like this:

    No, emails are queued up and send periodically. I'm not sure of the exact mechanism, but many years ago, back when we had the IRC room, I wrote a bot that would "follow" certain users. I created a user, JesusBot, that was subscribed to get emails for every forum, and would then parse the message, see if it was by somebody it was following, and then write a message to IRC that this person had posted.

    The problem was, there was a 5-10 minute delay between when somebody posted and an email came in. This ended up not working because people in IRC were refreshing and seeing the posts, reading them and replying before the bot even got the damn email in the first place.


Log in to reply