CSVs ought to be in ASCII

CoyneTheDup

What's really sad is that there is an RFC that makes it really easy to do CSV right. Of course its just a RFC (and I'm pretty sure 95% of 'enterprise' developers have never even heard the term).

Well, who would read those? I mean, don't they [create hackers][2] or something?

[2]: https://tools.ietf.org/html/rfc1392@Jaime said:

XML may be much derided, but this is exactly the scenario it was designed for. At least it is specified well enough that you are guaranteed to get out whatever data you put in it.

Oh, yeah, except for all the people we've seen right here in good ol' TD that can't produce valid XML either.

Despite my gibe at RFC's above, the fact is that almost no one bothers to read any standard. That's okay if there's a library of some kind that actually works; and they actually use the library.

BaconBits

Personally, I kind of miss the old fixed width files. As long as the spec is available and there are no embedded control characters, newlines, or carriage returns, they work great. CSV parsers vary so wildly that they don't really solve any of that.

Buddy

The point is that there already is a standard for distributing tabular data, so that any conforming implementation of a csv writer can write as many rows and columns as it likes to a file, and any conforming csv reader can turn that into a string[][] at the other end. No such standard exists for xml. This is the issue that @bulb is trying to make you aware of.

Arantor

-ahem- XHTML?

I'll go back under my

Buddy

colspan

Weng

@Bulb said:

Yes, there is. As far as I can tell, it is not a standard.

We here at WtfCorp have defined a standard for representing tabular data in XML.

The root element contains many row elements which contain many cell elements. The element name of the cell elements define the column name. The element names at the root and row levels are ignored.

You can XSLT any other representation into that, so thats what we do our ETL against.

dkf

@Bulb said:

You didn't click the link, did you?

In that case, you're wandering off-topic. What's worse, you're not doing it in an amusing way. CSV files are a reasonable way of moving tabular data about such that people receiving it can take a look with a tool that they probably already have, instead of needing some application to be developed.

This is unlike with XML, where an application would need to be created. At best, that “application” might be a stylesheet, but XML's really a tree structure, so you'd lose the inherent tabular nature unless you teach the computer that this type of tree is a table. Rambling all over the neighbourhood just to go where I can get nearly instantly by using the right representation in the first place isn't a good use of resources.

[damn; now updates!]

tar

@mjmilan1 said:

Or use something like LibreOffice perhaps?

I know, it ain't excel, but it's capable of handling the import and to be honest it's replaced Office on my personal machine...

Yeah, I use Calc for graphing stuff on my personal time (you know, timing hash functions and analysing collisions, that kind of thing). Also, my résumé is a LibreOffice document, which I export as a PDF if I want to send it to a real person.

dse

I use LibreOffice for personal use. Perfect for a .doc resume for those who do not like Latex-generated pdf files!
For graphing or any sort of data manipulation/visualization I just use Jupyter and pandas, it is 2 lines to read anything:

    >>> import pandas as pd
    >>> df = pd.read_csv('path/to/file')  # read_csv/excel/json/...
    >>> df.plot()

CoyneTheDup

@Weng said:

The root element contains many row elements which contain many cell elements. The element name of the cell elements define the column name. The element names at the root and row levels are ignored.

So, basically a CSV file...done in XML, right? Something like this?

<?xml version="1.0" encoding="UTF-8"?>
<file name="test.csv">
   <row>
      <cell>Alice Beach</cell>
      <cell>1124 North St</cell>
      <cell>Huntly</cell>
      <cell>FL</cell>
      <cell>32444</cell>
   </row>
   <row>
      <cell>Cyril Dunleavy</cell>
      <cell>397 Calvert St</cell>
      <cell>Springtown</cell>
      <cell>NE</cell>
      <cell>62579</cell>
   </row>
</file>

That's cool.

Lawrence

@CoyneTheDup, post:32, topic:54675, full:false said:

You should never have used Excel to create a graph for something important enough that the distinction between smoothed and not was critical.

In an ideal world . . . in the real world, Excel is used for things so important that every single addition should be manually double-checked by pair mathematicians, and every single formula should be triple-validated by subject experts back-checked by a committee, every single member having to attest to the correctness and provability by signing their name in blood. It's a miracle that there isn't a severe nuclear incident every ten years and that there isn't an economic crisis every five and that the economy actually works.

What's that you say about following other news than TDWTF?

CoyneTheDup

@Lawrence said:

In an ideal world . . . in the real world, Excel is used for things so important that every single addition should be manually double-checked by pair mathematicians, and every single formula should be triple-validated by subject experts back-checked by a committee, every single member having to attest to the correctness and provability by signing their name in blood. It's a miracle that there isn't a severe nuclear incident every ten years and that there isn't an economic crisis every five and that the economy actually works.

What's that you say about following other news than TDWTF?

Well, you'd think people would take more care, but mostly what you get is , "Meh."

As for economic crises: FAQ: Reinhart, Rogoff, and the Excel Error That Changed History

NTW

Seems like such a waste of space. Especially as the files get larger.

Weng

Actually,

<?xml version="1.0" encoding="UTF-8"?>
<file>
   <row>
      <name>Alice Beach</name>
      <address1>1124 North St</address1>
      <city>Huntly</city>
      <state>FL</state>
      <zip>32444</zip>
   </row>
   <row>
      <name>Cyril Dunleavy</name>
      <address1>397 Calvert St</address1>
      <city>Springtown</city>
      <state>NE</state>
      <zip>62579</zip>
   </row>
</file>

As this also encodes the column names.

And yes, it is grotesquely wasteful. We only use this as a bridge for importing/exporting other XML formats (translation via XSLT) to avoid writing custom import/export code for everybody's special snowflake XML.

dcon

@boomzilla said:

Except for when you think you're getting xml but you really get something

They think they're giving you XML. But they wrote their own xml parser/serializer.

dkf

@dcon said:

But they wrote their own xml parser/serializer.

Ah yes, the ASV (angle-bracket separated value) file!

gleemonk

@NTW said:

What's really sad is that there is an RFC that makes it really easy to do CSV right.

I didn't know there was such a thing! It seems ridiculous but ... it is actually a good thing to have.

Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all).

Actually in an exporter I wrote I enclosed every field, even the empty ones, in double quotes. Excel needs the first character to be a double quote or it fails for some dumb unknown reasons. I figured I'd just simplify instead of treating the first field special.