The Bar-Seperated Values CSV



  • A company I worked with some time ago had just purchased a new, expensive, ERP system which was a much more modern and sophisticated system than their old one.  The vendor of the ERP told me that it accepts the "common CSV" file format for all of its imports, so it's super-simple to get data into the system*.

    Following their instructions, I put together several CSV files for import, and used the import tool to get them into the system.  The results were less than stellar: thousands of record were missing, tens of thousands of records were spread out across other records, and there were tens of thousands more records than what I have imported.

    After making sure my files were formatted correctly, and saw that I didn't make any mistakes, I was forced to go back to the vendor and describe the problem.

    Vendor:   Well, for one, you can't have commas in your data anywhere or it confuses the parser.

    Me:  I realize that un-enclosed commas will interfere, but every field that contains commas is enclosed in quotes.

    Vendor:  You can't have quotation marks in the file at all, it confuses the parser.  And wrapping commas in quotes doesn't matter.  You can't have commas in your data.  That's what the parser uses as the field delimiter.

    Me:  We kind of need commas in the address lines, for example.

    Vendor:  You can't have commas in your data.  Period.  It's a CSV file.  That means NO COMMAS.

    This conversation went on like this for a while.

    After days of tweaking and abusing the system, it still refused to accept a properly formatted CSV file, but happily consumed my bar-delimited, tilde-enclosed, file as long as it had the CSV extension and used Windows-style line endings.

    This was the beginning of a very long ERP implementation project.  Last I heard, it was scheduled to be completed sometime after the contract for this new system expired, when the company planned to move to another new system.

     

    *The SQL interface is read-only, and is just a mirror of their proprietary back-end DB format.  Although it is read-only, you still have write-access to it, and any changes you make could corrupt the data and break the entire system.



  • @KrakenLover said:

    Vendor:   Well, for one, you can't have commas in your data anywhere or it confuses the parser.

    Me:  I realize that un-enclosed commas will interfere, but every field that contains commas is enclosed in quotes.

    Vendor:  You can't have quotation marks in the file at all, it confuses the parser.  And wrapping commas in quotes doesn't matter.  You can't have commas in your data.  That's what the parser uses as the field delimiter.

    Me:  We kind of need commas in the address lines, for example.

    Vendor:  You
    can't have commas in your data.  Period.  It's a CSV file.  That means NO COMMAS.


    Seriously? Processing a .CSV properly was like the second project I was assigned in my Intro to Programming course. And it could handle commas. And quotations. As a matter of fact, that's pretty much the fucking definition of a .CSV file. Lawyer up and sue them for grotesquely misrepresenting their product.



  • @mikeTheLiar said:

    Seriously? Processing a .CSV properly was like the second project I was assigned in my Intro to Programming course. And it could handle commas. And quotations. As a matter of fact, that's pretty much the fucking definition of a .CSV file. Lawyer up and sue them for grotesquely misrepresenting their product.
    Ah, but, see: It does import CSV* files.

    This is barely the topmost tip of the iceberg of the WTF this vendor subjected me to.

     



  • @mikeTheLiar said:

    Seriously? Processing a .CSV properly was like the second project I was assigned in my Intro to Programming course. And it could handle commas. And quotations. As a matter of fact, that's pretty much the fucking definition of a .CSV file. Lawyer up and sue them for grotesquely misrepresenting their product.
    I've seen CSV defined as "Character Separated Variables", and have used products that use | as the separator.



  • @OzPeter said:

    I've seen CSV defined as "Character Separated Variables", and have used products that use | as the separator.
     

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

    @KrakenLover said:

    It's a CSV file.  That means NO COMMAS.

    I felt my pedantic meter twitch then.

     



  • @KrakenLover said:

    ERP system

    Whenever I read that phrase, I imagine it sounding like a burp.




  • Tab seperated is much better (if you need tabs and line breaks in your data too, use US and RS characters as field and record separates).



  • @zzo38 said:

    use US and RS characters as field and record separates
     

    Using an old standard (ASCII) the way it was originally intended? Inconceivable!



  • @DaveK said:

    Whenever I read that phrase, I imagine it sounding like a burp.
     

    The loud, moist kind that bings back memories of what you ate a few hours ago.



  • @Cassidy said:

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

     

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

     



  • @mol1111 said:

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

    Likewise with date formats and function names. Excel is a great example of using localization to subtract value from your product.



  • @dhromed said:

    @DaveK said:

    Whenever I read that phrase, I imagine it sounding like a burp.
     

    The loud, moist kind that bings back memories of what you ate a few hours ago.

    Mmm, mushroom and swiss



  • @Ben L. said:

    Mmm, mushroom and swiss
     

    You eat the swiss?



  • FreeBSD /usr/ports/INDEX is pipe-separated and has no escaping/quoting.


  • Discourse touched me in a no-no place

    @OperatorBastardusInfernalis said:

    FreeBSD /usr/ports/INDEX is pipe-separated and has no escaping/quoting.
    And the /etc/passwd file is colon-separated. None of that matters (well, not very much) if the file isn't an interchange format. Applying stupid restrictions in an interchange format where there's a public description that says the exact opposite, that's the WTF.



  • Still a better CSV parser than Adobe InDesign. It doesn't let you use newlines, and there's no equivalent character you can use as a delimiter. Just no newlines at all. The best solution I could come up with was:

    s/([^"])"\n"([^"])/$1"<wbr>ROWDELIMITER"$2/g
    s/\n/NEWLINE/g
    s/ROWDELIMITER/\n/g

    And a search and replace in InDesign. This is 700 dollar per license industry-standard software.



  • @Cassidy said:

    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.



  • @blakeyrat said:

    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    It always strikes me as a little spiteful when they talk about eating swedes....

     



  • @da Doctah said:

    @blakeyrat said:

    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    It always strikes me as a little spiteful when they talk about eating swedes....
     

    Oh, you haven't dined when you've tasted those!

    And as for eating Indian... zowie!

     



  • @briverymouse said:

    This is 700 dollar per license industry-standard software.

    That's probably the reason. >$500 software can't be good. The sweet spot is probably closer to $50.



  • @Cassidy said:

    @da Doctah said:

    It always strikes me as a little spiteful when they talk about eating swedes....
     

    Oh, you haven't dined when you've tasted those!

    And as for eating Indian... zowie!

     

    I remember the first time some Aussie on a BBS made a remark about drowning "chooks" in the toilet bowl.  Where I went to high school, a "chook" was a Mexican (short for "pachuco", and not a word a Gringo would risk using to their face).

     



  • @Cassidy said:

    @da Doctah said:

    @blakeyrat said:

    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    It always strikes me as a little spiteful when they talk about eating swedes....
     

    Oh, you haven't dined when you've tasted those!

    And as for eating Indian... zowie!

     

    What if I had said "mushroom and swiss cheese"? Would that make you happy? No, because it implies "mushroom cheese"? How about "edible mushroom fungus with melted American-made cheese resembling a variety of cheese originally made in Switzerland"? That sounds less like a food and more like Microsoft's next phone OS.


  •  @mol1111 said:

    @Cassidy said:

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

     

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

    "CSV" handling in Excel is weird. Saving as CSV file will use the value from regional setting (in case of Finland, it's ";"). But if you paste text and select text import wizard, it will default to space. But if you just open file, it will default to regional setting one.

     



  •  Lot of Thai ladies eat the Swiss. Make good money at it, too.

     



  • PAGING ADRIA

    CLEANUP IN THREAD 27440



  • @KrakenLover said:

    Vendor:   Well, for one, you can't have commas in your data anywhere or it confuses the parser.

    Vendor:  You can't have quotation marks in the file at all, it confuses the parser.  And wrapping commas in quotes doesn't matter.  You can't have commas in your data.  That's what the parser uses as the field delimiter.

    Vendor:  You can't have commas in your data.  Period. 

    wow. that's pretty retarded. about 5 years ago I needed to write a csv parser for some data import tool and in my infinite laziness I took the first decent looking C# class a single google search threw at me. it handled all of those cases and normalised the line endings too. it was hard work changing the namespace and naming to match our convention, mind.

     

     



  • @Ben L. said:

    That sounds less like a food and more like Microsoft's next phone OS.
     

    But it's still made me hungry, damn you.



  • @blakeyrat said:

    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    I've never heard anyone say "let's order chinese takeout food", either. It'st he putting "a" in front of it that makes in confusing (to an american).



  • @taustin said:

    I've never heard anyone say "let's order chinese takeout food", either. It'st he putting "a" in front of it that makes in confusing (to an american).

    Yeah and where the fuck does it come from? Why would anybody ever insert a "a" in that context? It's crazy.


  • Considered Harmful

    @taustin said:

    @blakeyrat said:
    @Cassidy said:
    @Ben L. said:
    Mmm, mushroom and swiss
    You eat the swiss?

    I think it's funny when you watch British TV and they say things like: "let's take out a Chinese."

    What they mean is let's order take-out Chinese *food*, but for some reason they leave out the word food.

    I've never heard anyone say "let's order chinese takeout food", either. It'st he putting "a" in front of it that makes in confusing (to an american).

    To me, the striking difference is the use of "take out" as a verb phrase instead of a noun phrase. To take something out is to kill or destroy it; combined with "a Chinese" it sounds more like you're ordering a contract hit than dinner.



  • @blakeyrat said:

    @taustin said:
    I've never heard anyone say "let's order chinese takeout food", either. It'st he putting "a" in front of it that makes in confusing (to an american).

    Yeah and where the fuck does it come from? Why would anybody ever insert an "a" in that context? It's crazy.

    FTFY



  • @blakeyrat said:

    Yeah and where the fuck does it come from?
     

    Originally, the recipe would be from China... but I think they cheat and use locally-sourced ingredients.



  • This reminds me of the time we had a customer request a data dump of a Microsoft SQL database in tab-delimitted BCP format. I had offered to give them CSV files generated through SSIS, but they insisted on BCP. Of course, this led to fun problems such as having to replace delimiting characters in certain records (tabs, newlines, carriage returns) with a single space, and not having any column headers in the first row. I went along with it because I figured they were going to import the data into a SQL database on their end. I even provided them with DDL scripts to create the tables and an import script for the bcp files.

    It turns out the request for bcp was coming from a non-technical BA/Project Manager who thought "bcp" was some generic buzzword for data dumps rather than a specific SQL Server utility. Once the DBA got their hands on the data, they were confused about why the bcp files didn't have any column headers. I think I wound up sending them the CSVs that I had originally suggested. They really just wanted something that they could import into Microsoft Access (that's a whole other WTF, but at least it was out of our hands at that point).



  • @bighusker said:

    They really just wanted something that they could import into Microsoft Access (that's a whole other WTF, but at least it was out of our hands at that point).
     

    Firstly.. ain't there some DTS wizard that squirts data between MS-SQL and Access?

    Secondly... can't you just save Access DBs as some MDB file and copy that between Access installations?



  • @joe.edwards said:

    To me, the striking difference is the use of "take out" as a verb phrase instead of a noun phrase. To take something out is to kill or destroy it; combined with "a Chinese" it sounds more like you're ordering a contract hit than dinner.

    In the Queen's English the term is "take away" which makes more sense!



  • @AndyCanfield said:

     Lot of Thai ladies eat the Swiss. Make good money at it, too.

     

    I'm Swiss, but I don't like to pay for them Thai ladies.



  • @TheRider said:

    @AndyCanfield said:

     Lot of Thai ladies eat the Swiss. Make good money at it, too.

     

    I'm Swiss, but I don't like to pay for them Thai ladies.

    A man after my own heart. Why pay when a straight razor to the Adam's apple does the same thing, right?



  • @mol1111 said:

    @Cassidy said:

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

     

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

     

    Yeah... that goes to show why most of the crap that's regionalized shouldn't be.



  • @bridget99 said:

    @mol1111 said:

    @Cassidy said:

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

     

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

     

    Yeah... that goes to show why most of the crap that's regionalized shouldn't be.

    Regionalization was clearly the backwards approach to take. We should have focused on standardization, using nukes, if necessary:

    • One, single, global timezone. No DST. One calendar, one set of months, one single representation of the year. Year 0 was the Big Bang, everything works up from there (although we can abbreviate the year to the last 4 digits to make things easier.)

    • English is the standard language for all software. You can keep your local dialect of tongue-clicks and howls, but they will not be supported in software or hardware.

    • Dots for decimals, commas for thousands, just as God intended.

    • Everyone gets one first name, one last name and a middle initial. No spaces in names--none of this "Van Der Whatever" crap. No apostrophes, either, Ireland.

    • Internet porn shall not contain any of that weird, German feces stuff. In fact, no anal at all. All porn will star the same male actor so we can accustomed to looking at a single dong and won't spend half the film trying to figure out what the hell is going on with his foreskin.



  • @morbiuswilters said:

    @bridget99 said:
    @mol1111 said:

    @Cassidy said:

    Someone mentioned that to me also, showing me that Excel allows you to specify the field delimiter when importing CSV but defaults to a comma.

     

    Sadly Excel defaults to whatever is set in Windows' regional settings (also true for decimal mark), which make it lot of more fun in multilanguage environments.

     

    Yeah... that goes to show why most of the crap that's regionalized shouldn't be.

    Regionalization was clearly the backwards approach to take. We should have focused on standardization, using nukes, if necessary:

    • One, single, global timezone. No DST. One calendar, one set of months, one single representation of the year. Year 0 was the Big Bang, everything works up from there (although we can abbreviate the year to the last 4 digits to make things easier.)

    • English is the standard language for all software. You can keep your local dialect of tongue-clicks and howls, but they will not be supported in software or hardware.

    • Dots for decimals, commas for thousands, just as God intended.

    • Everyone gets one first name, one last name and a middle initial. No spaces in names--none of this "Van Der Whatever" crap. No apostrophes, either, Ireland.

    • Internet porn shall not contain any of that weird, German feces stuff. In fact, no anal at all. All porn will star the same male actor so we can accustomed to looking at a single dong and won't spend half the film trying to figure out what the hell is going on with his foreskin.

    • Computers shall have no storage of any kind. Storage allows different versions of programs and operating systems to be used, which is counter to the Objective.

    - There will be one computer company, one food company, and one construction company.



  • @Ben L. said:

    - Computers shall have no storage of any kind. Storage allows different versions of programs and operating systems to be used, which is counter to the Objective.

    The real problem with storage is that it allows people to save their mind-numbing documents and photos of their bland, tedious existences and fat, ugly children.

    If I had my way, all digital cameras would be read-only. They will come pre-populated with photos of families more attractive than your own doing things far, far more interesting than you ever will do.

    This will also obviate the need for vacations, since the only reason most people go on vacation is so they can fly 3000 miles to take photos of something they think is significant because they saw it on TV and then gorge themselves on french fries and hamburgs in a Chili's with a slightly-different floor plan than the Chili's near their house.



  • @morbiuswilters said:

    @Ben L. said:
    - Computers shall have no storage of any kind. Storage allows different versions of programs and operating systems to be used, which is counter to the Objective.

    The real problem with storage is that it allows people to save their mind-numbing documents and photos of their bland, tedious existences and fat, ugly children.

    If I had my way, all digital cameras would be read-only. They will come pre-populated with photos of families more attractive than your own doing things far, far more interesting than you ever will do.

    This will also obviate the need for vacations, since the only reason most people go on vacation is so they can fly 3000 miles to take photos of something they think is significant because they saw it on TV and then gorge themselves on french fries and hamburgs in a Chili's with a slightly-different floor plan than the Chili's near their house.

    How do the attractive people take pictures?



  • @Ben L. said:

    @morbiuswilters said:
    @Ben L. said:
    - Computers shall have no storage of any kind. Storage allows different versions of programs and operating systems to be used, which is counter to the Objective.

    The real problem with storage is that it allows people to save their mind-numbing documents and photos of their bland, tedious existences and fat, ugly children.

    If I had my way, all digital cameras would be read-only. They will come pre-populated with photos of families more attractive than your own doing things far, far more interesting than you ever will do.

    This will also obviate the need for vacations, since the only reason most people go on vacation is so they can fly 3000 miles to take photos of something they think is significant because they saw it on TV and then gorge themselves on french fries and hamburgs in a Chili's with a slightly-different floor plan than the Chili's near their house.

    How do the attractive people take pictures?

    Well, obviously I will have the only read/write digital camera, which I will use to document my fabulous existence so that the rest of you might achieve some small, vicarious happiness which is--albeit it tiny--still far more happiness than you'd ever be able to achieve if left to your own devices.



  • @morbiuswilters said:

    All porn will star the same male actor
     

    I have someone in mind already.



  • @spamcourt said:

    @zzo38 said:

    use US and RS characters as field and record separates
     

    Using an old standard (ASCII) the way it was originally intended? Inconceivable!

    Yes, that is what I use. I use ASCII, not Unicode.



  • @zzo38 said:

    @spamcourt said:

    @zzo38 said:

    use US and RS characters as field and record separates
     

    Using an old standard (ASCII) the way it was originally intended? Inconceivable!

    Yes, that is what I use. I use ASCII, not Unicode.

    Thank you for rezzing this thread to let the world know you've just awoken from your 1980s TimePod®.


Log in to reply