I can show the auditors some of the data



  • Another team was having a routine db audit by corporate, when some issues arose. Specifically: why are most of the records missing 98% of the data? The developers on the team to which we send the auditing data swore up, down and sideways that their system was recording all the data it was reading. The corporate folks asked me to look at our code since we were the ones sending it to the auditing system.

    I ran across this piece of programming excellence:

       // On our application/sending side
    Socket socket = new Socket(...);
    DataOutputStream dos = new DataOutputStream(socket.getOutputStream());
    dos.writeBytes("string comprising 64K of auditing data");
    dos.flush();

    // on the receiving/auditing system side
    ServerSocket ss = new ServerSocket(...);
    ...
    Socket socket = ss.accept();
    DataInputStream dis = new DataInputStream(socket.getInputStream());
    byte [ ] bytes = new byte[1024];
    dis.readFully(bytes);
    String auditDataFromApp = new String(bytes);
    // put it in a database

    Apparently, reading all the data that we might send was not considered important. They were correct; they did save all the data they read; just not all the data that was sent to them.

    The auditors instructed the other team to fix their software immediately as we were in violation of Federal regulations. The other team said it would take three months before their next release, and so they couldn't accommodate.

    Corporate auditing instructed me to put in a patch to temporarily read the data we sent to the auditing team back from the database, and log an error if it was not saved properly. It took me ten minutes to create and test the patch, 20 minutes more to get permission to do a special release, and 30 minutes to get sudo permission to actually do it in production.

    Corporate auditing asked why I could do this but the other team couldn't. I pointed out that they operate under the same rules as my team (read between the lines). They were not amused.

     

     



  • Given that issue, would it not be possible to fix the problem on your end by using smaller block sizes?  Or is this one of those things where each block starts with some information indicating what the rest of the data is, in such a way that the 64k blocks cannot be fragmented correctly such that the other end can read them?

    I understand this would be covering for their stupidity, but I could see it as ethical to release such a fix if you could get management to agree to fix the *real* problem on the other end once you demonstrated comprehensively by implementing a working solution that they were the real problem.



  • @tgape said:

    I understand this would be covering for their stupidity, but I could see it as ethical to release such a fix if you could get management to agree to fix the *real* problem on the other end once you demonstrated comprehensively by implementing a working solution that they were the real problem.

     

    You are kidding, right? I can't comment on what would be the ethical thing to do (don't really know, depends on many variables), but once he writes such a fix, the problem will never be properly solved.

    Also, if he was able to fix it, certainly the problem was with his code, wasn't it? How do you expect anybody to belive the problem was with the other team?

     



  • @tgape said:

    Given that issue, would it not be possible to fix the problem on your end by using smaller block sizes?  Or is this one of those things where each block starts with some information indicating what the rest of the data is, in such a way that the 64k blocks cannot be fragmented correctly such that the other end can read them?

    I understand this would be covering for their stupidity, but I could see it as ethical to release such a fix if you could get management to agree to fix the *real* problem on the other end once you demonstrated comprehensively by implementing a working solution that they were the real problem.

    The problem isn't block size, it's the fact that the code only captures the first 1024 bytes of data.  They read the first chunck in, then kill the connection.  Great jorb guys!  :)



  • Am I to understand that snoofle's 'fix' was simply an alarm saying "oh, noes! they didn't save all our dataz!"?'



  • @zelmak said:

    Am I to understand that snoofle's 'fix' was simply an alarm saying "oh, noes! they didn't save all our dataz!"?'

    I can find a much more effecient way of doing this.  Just dump this piece of code in:

    public bool DidTheyGetTehLogz()

    {

    // TODO: change this to true when those lazy asses patch their shite code

    return false;

    }



  • @zelmak said:

    Am I to understand that snoofle's 'fix' was simply an alarm saying "oh, noes! they didn't save all our dataz!"?'

     

    That's what the auditor asked for, and it is an important feature to have.

    Once the other team fixes their code, it won't be triggered every day anymore, and will start to convey some important information.

     



  •  @C-Octothorpe said:

    The problem isn't block size, it's the fact that the code only captures the first 1024 bytes of data.  They read the first chunck in, then kill the connection.

     

    Not that it's efficient (open/close 64 connections vs. 1) or that it addresses either of Mcoder's points, but just as a thought exercise:

     

    # pseudocode

    for 1k_chunk_count = 1 to 64

      open connection

      send 1k out of the total 64k

      close connection

    next




  • @emurphy said:

     @C-Octothorpe said:

    The problem isn't block size, it's the fact that the code only captures the first 1024 bytes of data.  They read the first chunck in, then kill the connection.

     

    Not that it's efficient (open/close 64 connections vs. 1) or that it addresses either of Mcoder's points, but just as a thought exercise:

     

    # pseudocode

    for 1k_chunk_count = 1 to 64

      open connection

      send 1k out of the total 64k

      close connection

    next


    Most RDBMS support connections pooling so opening/closing frequently does not really matter.



  • It's a socket, not a RDBMS...



  • @ekolis said:

    It's a socket, not a RDBMS...

    Sorry I was fooled by this sentence from the OP: "Corporate auditing instructed me to put in a patch to temporarily read the data we sent to the auditing team back from the database"



  • @All:

    The data is one long continuous record. Breaking it up into multiple chunks would make it appear as 64 separate transactions instead of 1 - absolute fraud from this end.

    To clarify: we send data, via socket, to the audit team's application. They are charged with storing certain kinds of information and providing records when some government agency requests an audit due to some event. They provide the connection library objects. The auditors requested demanded that I add code to connect directly to the audit team's database (which we already do for unrelated reasons) and directly read back what I previously sent them (via the socket wrapper) to save, and then compare the strings. If it doesn't match, log an alert. The alerts go to different places depending upon the nature of the alert. In this case, the corporate auditors. In other words, they want to cover the firm's ass by making it very visible and embarrassing for the other manager until they fix the problem. He is now motivated.

    It's his own fault. Yes, their developers screwed up. BUT when he was asked to implement an emergency fix, he played hardball and hid behind firm rules (one release per quarter) instead of just accomodating them with an expedited release. I knew better and gave them what they wanted.

    Corporate auditors can be your best friends, if you know how to play nice with them!



  • @snoofle said:

    Corporate auditors can be your best friends, if you know how to play nice with them!
     

    So you're saying that they're a bunch of tools?



  • @snoofle said:

    It's his own fault. Yes, their developers screwed up. BUT when he was asked to implement an emergency fix, he played hardball and hid behind firm rules (one release per quarter) instead of just accomodating them with an expedited release.
    Playing hardball is one thing, but playing hardball with people who charged with playing hardball in order to protect the company and its investors is just unbelievably stupid.



  • @snoofle said:

    dos.writeBytes("string comprising 64K of auditing data");

    String auditDataFromApp = new String(bytes);

    Please tell me that the lack of specified encoding in those conversions is just a side-effect of anonymising the code.



  • @pjt33 said:

    @snoofle said:
    dos.writeBytes("string comprising 64K of auditing data");

    String auditDataFromApp = new String(bytes);

    Please tell me that the lack of specified encoding in those conversions is just a side-effect of anonymising the code.

    You really suggest to hard-code encoding?



  • I am really quite a fan of Java, but never having come across readFully, I had to read the documentation, fully.

     

    public void readFully (byte[] b)

        Reads some bytes from an input stream and stores them into the buffer array

     

    Apparently readFully, reads SOME bytes.

    ouch



  • readFully makes sure that the destination array is filled fully afterwards (or an EOFException is thrown), while a normal read only guarantees to read one byte, but may read more if it is available without blocking. Useful for reading some binary protocols.

    If you want to read everything from a stream until EOF, without knowing the size, you either need a loop (writing the read chunks into a ByteArrayOutputStream if you are lazy), or use functions from 3rd party libraries like Apache Commons or Guava.



  • @Speakerphone Dude said:

    @pjt33 said:
    @snoofle said:
    dos.writeBytes("string comprising 64K of auditing data");

    String auditDataFromApp = new String(bytes);

    Please tell me that the lack of specified encoding in those conversions is just a side-effect of anonymising the code.

    You really suggest to hard-code encoding?

    Fine, I'll bite. I suggest specifying the encoding explicitly. Hard-coding it isn't the only way to do that, although hard-coding to a sane value (which, according to context, would be "ISO-8859-1" or one of the three UTFs supported) is one way.

    I also suggest using a FindBugs rule to complain about every conversion of String to byte[ ] or vice versa which doesn't specify an encoding, as well as every creation of an InputStreamReader which doesn't, etc.



  • @pjt33 said:

    @Speakerphone Dude said:
    @pjt33 said:
    @snoofle said:
    dos.writeBytes("string comprising 64K of auditing data");

    String auditDataFromApp = new String(bytes);

    Please tell me that the lack of specified encoding in those conversions is just a side-effect of anonymising the code.

    You really suggest to hard-code encoding?

    Fine, I'll bite. I suggest specifying the encoding explicitly. Hard-coding it isn't the only way to do that, although hard-coding to a sane value (which, according to context, would be "ISO-8859-1" or one of the three UTFs supported) is one way.

    I also suggest using a FindBugs rule to complain about every conversion of String to byte[ ] or vice versa which doesn't specify an encoding, as well as every creation of an InputStreamReader which doesn't, etc.

    The problem with today's sane values is that they don't work well on tomorrow's insane environment. If an application is written in Java, part of the sales pitch is that it is cross-platform and can be deployed easily in any environment that has a jvm; however as things like encoding start getting hard-coded this benefit goes away quick.

    If you have worked with terminal emulators you know how encoding can be a bitch. Same goes with some obsolete (yet still alive and kicking) systems designed by people who could not think that anything but 7-bit ASCII would make sense one day. Then there is EBCDIC, nasty but still in the wild. Etc.



  • @mihi said:

    readFully makes sure that the destination array is filled fully afterwards (or an EOFException is thrown), while a normal read only guarantees to read one byte, but may read more if it is available without blocking. Useful for reading some binary protocols.

    If you want to read everything from a stream until EOF, without knowing the size, you either need a loop (writing the read chunks into a ByteArrayOutputStream if you are lazy), or use functions from 3rd party libraries like Apache Commons or Guava.

    I understand, but it is still funny that the first sentence of the documentation on readFully starts with the two words, "Reads some". No????

    I might have named it readBlock().

    There is a non-zero possibility, that the name of the method helped confuse the author of this WTF.

     



  • @Speakerphone Dude said:

    If you have worked with terminal emulators you know how encoding can be a bitch.


    I've worked with Java on Apples. That's where I learnt that encoding is a bitch, and that's why I insist on explicitly specifying the encoding. The alternative is that each platform picks one for you in ways which you can't necessarily control.



  • @pjt33 said:

    @Speakerphone Dude said:

    If you have worked with terminal emulators you know how encoding can be a bitch.


    I've worked with Java on Apples. That's where I learnt that encoding is a bitch, and that's why I insist on explicitly specifying the encoding. The alternative is that each platform picks one for you in ways which you can't necessarily control.

    Specifying != hardcoding. The value should be fetched from a parameter or config file (or at least could be overriden by a parameter or config file).



  • @Speakerphone Dude said:

    Specifying != hardcoding.


    I knew I'd regret feeding the troll, so I will stop now.


Log in to reply