Do not compare Strings
-
What about \0 separated values? Isn't the point of the null character as a string terminator?
That is evil.
I love it.
-
Given how many security issues have been caused by bad \0 handling, do you really want to use it for data exchange?
There's one point when using it for data exchange; if you do, you need to get your
NUL
handling right.
-
Yes, I think I see how this could be an improvement
-
But how should I represent nullish values in my CSV files now?
-
Yes, I think I see how this could be an improvement
Not my fault if your fonts are out of date
-
http://what.thedailywtf.com/t/character-separated-values/2124/16?u=aliceif
Filed under: [From the CSV thread](#poop)
-
What about \0 separated values? Isn't the point of the null character as a string terminator?
You can use
\xff
to separate UTF-8 encoded values.
-
But how should I represent nullish values in my CSV files now?
Do an Oracle and use the empty string!
-
But how should I represent nullish values in my CSV files now?
Same way you represent commas at the moment, quote them
-
What about \0 separated values? Isn't the point of the null character as a string terminator?
You're joking, but it's a thing: http://blogs.msdn.com/b/oldnewthing/archive/2009/10/08/9904646.aspx
-
You might think that Windows is TRWTF for implementing such scheme, but for me, TRWTF are the people who call it "double-null-terminated string" instead of "empty-string-terminated array of null-terminated-strings". It's like calling a float "a 32-bit integer that's neither big-endian nor little-endian, and requires special rules to read correctly".
-
I really don't understand what kind of culture could ever induce someone to consider null, and the empty string, to be the same.
This stuff almost always seems overblown to me. How often is it really important to differentiate between null and an empty string?
-
Not my fault if your fonts are out of date
It isn't ... but then again so aren't my localisation settings and they screw over CSV based implementations big time.
-
What about 0x1C - 0x1F (file separator, group separator, record separator, unit separator)?
from my ascii manpage but pretty sure unicode is the same.
-
This post is deleted!
-
I'm afraid they'd find a way to fuck it up even worse. It already doesn't actually adhere to CSV format, even if you didn't open it in Excel. Just that opening it in Excel destroys some of the data irretrievably.
So wait.
You have a non-CSV file. With a .csv file extension. And even though it's not a CSV file, you won't rename it to another file extension because "they'd fuck it up even worse." Somehow.
Oh and also Excel gets the blame for trying to parse your non-CSV file with a .csv file extension as if it were a CSV file.
I see no WTFs here at all.
-
Gee, or maybe using the ASCII characters SPECIFICALLY CALLED "record separator" and "field separator". It's almost as if the guys who invented ASCII solved this problem years ago, but idiot dumbshits refuse to use the solution because it's not as "human-readable".
Somewhere we should have a database of IT problems that were solved decades ago, but nobody uses the solution because people in IT are all shitheads.
-
But what do you call "record separator separated values"?
RSSV?
-
According to this chart (PDF), they're the same in Unicode.
-
Why don't we call it the Everybody's An Asshole File Format. EAAFF. As an added bonus, it's pronounceable. And an valid onomatopoeia for the reaction people have when exposed to CSV.
I'm going to work.
-
Can we just agree that, if we don't care about it being human readable, we all just use SQLite like some other companies?
-
-
Oh, hey @immibis_. Welcome to TDWTF. Get an avatar and get it hatted.
-
TRWTF are the people who call it "double-null-terminated string" instead of "empty-string-terminated array of null-terminated-strings".
You must not be a developer. You're not lazy enough.
-
Oh shit, it's Iblis from Sonic '06!
-
You must not be a developer. You're not lazy enough.
Oh, I am lazy. I'm so lazy I just call it PCZZTSTR. It's zero-syllable word, which makes saying it infinitely quick.
-
-
-
But damn good fun!
Mine's a tiara
-
Get an avatar and get it hatted.
I'm missing something there.
Anyone up to the challenge of hatting my avatar?
-
-
-
paging @abarker
-
I'm missing something there.
Anyone up to the challenge of hatting my avatar?
You want me to hat a sword?
Umm … Get me a large copy and I'll see what I can do.
-
@Gaska said:
PCZZTSTR
Sounds <del>legit</del><ins>Polish</ins>.
Sounds like it came out of the Win32 API.
-
-
3) CSV as a import/export format. 9 out 10 it isn't even actual comma separated but | ; tab or whatever.
Doesn't the C stand for 'character', not 'comma'?
-
You just gave me an idea for my avatar:
-
pretty sure that's a retrobacronym, otherwise why would i have had to deal with PSV (pipe separated) and TSV (tab separated) and ZSF( '\0' spearated) files before?
-
On topic of CSV, Wikipedia is sometimes very funny:
The plain-text character of CSV files largely avoids incompatibilities such as byte-order and word size.
-
using the ASCII characters SPECIFICALLY CALLED "record separator" and "field separator".
Does anyone/anything actually support that?
We actually have code on import that explicitly strips all control characters other than tab and newline(s). This is mostly because we often get completely broken files. Once someone used \x09 for apostrophes, which broke the XML parsing completely.
-
Does anyone/anything actually support that?
I've never seen anything that used them. I've always wanted to. They're right there, begging to be used-- they can't possibly conflict with the data in the file, they're fucking perfect for making something like a CSV file.
But nobody uses them.
-
People can't even agree what character(s) to use to end lines. Fancy things like field separators seem hopeless.
-
I've never seen anything that used them. I've always wanted to. They're right there, begging to be used-- they can't possibly conflict with the data in the file, they're fucking perfect for making something like a CSV file.
But nobody uses them.
I'm pretty sure that some of those 50s era teletype data formats use 'em... (AFTN 10 doesn't, but it does use several other control chars you never see)
-
I've never seen anything that used them. I've always wanted to.
Have you used an ATM at a major bank in the United States recently? The transmission format they use to communicate with the central servers uses field-sep and record-sep. I was floored.
-
Hey people/idiots: when I say "I've never seen anything" that means "I've never seen anything". It doesn't mean "nothing uses them." There is a difference. I am grumpy.
-
when I say "I've never seen anything" that means "I've never seen anything".
But nobody uses them.
;)
I am grumpy.
I wouldn't have it any other way :)
-
Hey people/idiots: when I say "I've never seen anything" that means "I've never seen anything".
So… you've never seen an ATM?
-
Hey people/idiots: when I say "I've never seen anything" that means "I've never seen anything". It doesn't mean "nothing uses them."
Guys, it also means, "Don't tell me places where it's actually happened."
-
Guys, it also means, "Don't tell me places where it's actually happened."
If they want to poke the bear, let them poke the bear. Just stand back. Way back.