The Bar-Seperated Values CSV



  • A company I worked with some time ago had just purchased a new, expensive, ERP system which was a much more modern and sophisticated system than their old one.  The vendor of the ERP told me that it accepts the "common CSV" file format for all of its imports, so it's super-simple to get data into the system*.

    Following their instructions, I put together several CSV files for import, and used the import tool to get them into the system.  The results were less than stellar: thousands of record were missing, tens of thousands of records were spread out across other records, and there were tens of thousands more records than what I have imported.

    After making sure my files were formatted correctly, and saw that I didn't make any mistakes, I was forced to go back to the vendor and describe the problem.

    Vendor:   Well, for one, you can't have commas in your data anywhere or it confuses the parser.

    Me:  I realize that un-enclosed commas will interfere, but every field that contains commas is enclosed in quotes.

    Vendor:  You can't have quotation marks in the file at all, it confuses the parser.  And wrapping commas in quotes doesn't matter.  You can't have commas in your data.  That's what the parser uses as the field delimiter.

    Me:  We kind of need commas in the address lines, for example.

    Vendor:  You
    can't have commas in your data.  Period.  It's a CSV file.  That means NO COMMAS.

    This conversation went on like this for a while.

    After days of tweaking and abusing the system, it still refused to accept a properly formatted CSV file, but happily consumed my bar-delimited, tilde-enclosed, file as long as it had the CSV extension and used Windows-style line endings.

    This was the beginning of a very long ERP implementation project.  Last I heard, it was scheduled to be completed sometime after the contract for this new system expired, when the company planned to move to another new system.

     

    *The SQL interface is read-only, and is just a mirror of their proprietary back-end DB format.  Although it is read-only, you still have write-access to it, and any changes you make could corrupt the data and break the entire system.



  • @KrakenLover said:

    Vendor:   Well, for one, you can't have commas in your data anywhere or it confuses the parser.

    Me:  I realize that un-enclosed commas will interfere, but every field that contains commas is enclosed in quotes.

    Vendor:  You can't have quotation marks in the file at all, it confuses the parser.  And wrapping commas in quotes doesn't matter.  You can't have commas in your data.  That's what the parser uses as the field delimiter.

    Me:  We kind of need commas in the address lines, for example.

    Vendor:  You
    can't have commas in your data.  Period.  It's a CSV file.  That means NO COMMAS.


    Seriously? Processing a .CSV properly was like the second project I was assigned in my Intro to Programming course. And it could handle commas. And quotations. As a matter of fact, that's pretty much the fucking definition of a .CSV file. Lawyer up and sue them for grotesquely misrepresenting their product.



  • @mikeTheLiar said:

    Seriously? Processing a .CSV properly was like the second project I was assigned in my Intro to Programming course. And it could handle commas. And quotations. As a matter of fact, that's pretty much the fucking definition of a .CSV file. Lawyer up and sue them for grotesquely misrepresenting their product.
    Ah, but, see: It does import CSV* files.

    This is barely the topmost tip of the iceberg of the WTF this vendor subjected me to.

     



  • @mikeTheLiar said:

    Seriously? Processing a .CSV properly was like the second project I was assigned in my Intro to Programming course. And it could handle commas. And quotations. As a matter of fact, that's pretty much the fucking definition of a .CSV file. Lawyer up and sue them for grotesquely misrepresenting their product.
    I've seen CSV defined as "Character Separated Variables", and have used products that use | as the separator.



  • @OzPeter said:

    I've seen CSV defined as "Character Separated Variables", and have used products that use | as the separator.
     

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

    @KrakenLover said:

    It's a CSV file.  That means NO COMMAS.

    I felt my pedantic meter twitch then.

     



  • @KrakenLover said:

    ERP system

    Whenever I read that phrase, I imagine it sounding like a burp.




  • Tab seperated is much better (if you need tabs and line breaks in your data too, use US and RS characters as field and record separates).



  • @zzo38 said:

    use US and RS characters as field and record separates
     

    Using an old standard (ASCII) the way it was originally intended? Inconceivable!



  • @DaveK said:

    Whenever I read that phrase, I imagine it sounding like a burp.
     

    The loud, moist kind that bings back memories of what you ate a few hours ago.



  • @Cassidy said:

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

     

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

     



  • @mol1111 said:

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

    Likewise with date formats and function names. Excel is a great example of using localization to subtract value from your product.



  • @dhromed said:

    @DaveK said:

    Whenever I read that phrase, I imagine it sounding like a burp.
     

    The loud, moist kind that bings back memories of what you ate a few hours ago.

    Mmm, mushroom and swiss



  • @Ben L. said:

    Mmm, mushroom and swiss
     

    You eat the swiss?



  • FreeBSD /usr/ports/INDEX is pipe-separated and has no escaping/quoting.


  • Discourse touched me in a no-no place

    @OperatorBastardusInfernalis said:

    FreeBSD /usr/ports/INDEX is pipe-separated and has no escaping/quoting.
    And the /etc/passwd file is colon-separated. None of that matters (well, not very much) if the file isn't an interchange format. Applying stupid restrictions in an interchange format where there's a public description that says the exact opposite, that's the WTF.



  • Still a better CSV parser than Adobe InDesign. It doesn't let you use newlines, and there's no equivalent character you can use as a delimiter. Just no newlines at all. The best solution I could come up with was:

    s/([^"])"\n"([^"])/$1"<wbr>ROWDELIMITER"$2/g
    s/\n/NEWLINE/g
    s/ROWDELIMITER/\n/g

    And a search and replace in InDesign. This is 700 dollar per license industry-standard software.



  • @Cassidy said:

    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.



  • @blakeyrat said:

    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    It always strikes me as a little spiteful when they talk about eating swedes....

     



  • @da Doctah said:

    @blakeyrat said:

    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    It always strikes me as a little spiteful when they talk about eating swedes....
     

    Oh, you haven't dined when you've tasted those!

    And as for eating Indian... zowie!

     



  • @briverymouse said:

    This is 700 dollar per license industry-standard software.

    That's probably the reason. >$500 software can't be good. The sweet spot is probably closer to $50.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.