Obfuscation, yo
-
str_getcsv takes a CSV and turns it into an array.
Sooo...
$csvArray = explode(",", $csvString);
?
Well, that was time well spent!
-
Does the function handle stuff like commas in quotes (which I presume explode doesn't)?
-
Oh, my bad.
@boomzilla: yes. explode is for straight 'here's a string: a,b,c now make an array out of it with 3 elements' but CSV can be so much more complicated.
-
Does the function handle stuff like commas in quotes (which I presume explode doesn't)?
I was just taking the obligatory piss out of PHP. I assume the actual function does much more like you pointed out.
Filed under: I need to use more sarcasm tags
-
PHP. I assume the actual function does much more like you pointed out.
Aaaaaaaaaaaugh!
Maybe str_getcsv_escaped will.
-
-
Slow down there, cowboy.
-
You know what's really funny?
The PHP devs pre-empted the fuck out of all of you for once. str_getcsv does actually handle escaped shit too! (And then you have the choice of doing further escaping, or not, as your application needs later)
-
CSV can be so much more complicated.
Which is mostly caused by the fact that comma is the single most retarded character to use as a universal separator. It can show up both in textual data and numbers (and either as a decimal or thousands separator). It's only slightly less retarded than lowercase e.
And to think ASCII has had a unit separator for ages, but nobody gave enough of a fuck.
Filed under: me, I use ☺SV, or SV if Unicode is supported
-
but nobody gave enough of a fuck.
In my experience, you have to take fucks, 'cause people won't just give them out.
Filed Under: Rape Culture, Trigger Warning
-
-
Which is mostly caused by the fact that comma is the single most retarded character to use as a universal separator. It can show up both in textual data and numbers (and either as a decimal or thousands separator). It's only slightly less retarded than lowercase e.
In one place, I use
R
.Yes, really, blame stupid phones that won't let me enter any of the special characters that are allowed in SIP standard in their settings. And even if they do, it's inconsistent between manufacturers.
-
comma is the single most retarded character to use as a universal separator
Somebody here decided "SPLITCHAR" would make a good field separator.
-
And to think ASCII has had a unit separator for ages, but nobody gave enough of a fuck.
Try using a NUL (U+000000). Guaranteed to not be used for anything else in a CSV file!
-
Try using a NUL (U+000000). Guaranteed to
not be used for anything else in a CSV file!fuck up your CSV reader library!
-
Pick another character, rinse, repeat.
-
Not my fault if your lame language can't cope with an ordinary ASCII character!
-
What if it can't cope with any of them? snigger
-
And to think ASCII has had a unit separator for ages, but nobody gave enough of a fuck.
I'm sitting here at work at 10PM on a Saturday, working on our ETL system (a sprawling custom SSIS based monstrosity).Our clients are notoriously bad at doing CSV correctly (Or really, any delimiters).
You literally blew my mind there. Literally. I didn't believe you until I looked at my ASCII table and saw 1F sitting there.
I have now (in about 15 seconds of defining character constants) built in what I'm calling "ASCII Standard Delimited" support using 1E and 1F, and "ASCII Standard Delimited Derp Edition" which uses 1E and CRLF (or just LF). I will pass this along to our tech and sales goons and watch as they completely ignore this attempt at inducing sanity.
Seriously, I once had a customer deliver a "pipe delimited" file that, upon further investigation, had been built in memory. Memory that had been zero'd with the 0xDEADBEEF marker. And the "delimited" fields all had fixed widths. And the way they got data into them was by, apparently, zeroing the RAM, writing out the pipes and line feeds in the appropriate spots, and then copying tokenized, variable length strings, one token at a time, leaving out the whitespace, into that memory space... and never writing to the whitespaced sections., resulting in a file that looked like this:
someBvalueADBEEF|anotherFvalueEhereADBEEF
I've also seen a similar trick pulled off with fixed width files in memory that had been zero'd. Had to run a preprocessor over the file to remove NULL's to make anything read that one.
-
I used to work on ATMs, our messaging protocol used 1C, 1D, and 1E for various types of separators. So it's used in some places.
-
oh, that reminds me actually.
The project that kicked this topic off does all kinds of weird shit.
For example, it's forked an earlier web-based doohickey, and there's still a ton of inline JavaScript, onclicks etc. Now, instead of rewriting this and burying it in .js files, the guy in question does some seriously odd things.
He buffers all the output from PHP, then scrapes through it for onclick and other events, and grabs it all out of the buffer, and builds an array out of it so that duplicate events can be avoided.
So what then happens is you get data-onclick="1 3" where 1 and 3 are the indices of this array of JavaScript things to be run on that element being clicked.
And in overhauling all of this, he decided that instead of trying to fight with mixing quoting styles, he actually converts quotes to 0x0F and 0x10 characters when strings need to be output 'with escaping for JavaScript'.
And he's convinced this is a good thing. I mean, sure, it works, but it's really not the best way to actually do any of this stuff...
-
someBE16valueBE16EF16DE16AD16BE16EF16|anotherDE16valueBE16hereEF16DE16AD16BE16EF16
FTFY.
And many commiserations; that's the worst I've seen in a while. Reminds me of dealing with sunspot data (except that was in EBCDIC fixed width records with no field or record separators…)
-
I have now (in about 15 seconds of defining character constants) built in what I'm calling "ASCII Standard Delimited" support using 1E and 1F, and "ASCII Standard Delimited Derp Edition" which uses 1E and CRLF (or just LF).
Technically, it should be 1F and CRLF (1F is a field separator).
0xDEADBEEF
If your data was only low-ASCII, it's not so bad (replace with spaces and trim accordingly). If those bytes could appear in the data... well, you're pretty much fucked.
Filed under: pedantic dickweedery
-
If you store Unicode strings in UTF-8 you can use 0xFF to separate them, since this byte can not be present in a correctly encoded string.
-
I've seen that used as a record separator before
-
Vertical pipes. That's what you should use as a separator.
-
Vertical pipes. That's what you should use as a separator.
Unless you have to put regular expressions in your CSV!
... don't ask.
-
Unless you have to put regular expressions in your CSV!
... don't ask.
What if you need to put a CSV-parsing regular expression in your CSV?
-
What if you need to put a CSV-parsing regular expression in your CSV?
What if you need to write a regular expression that matches a CSV-parsing regular expression in a CSV file?
-
-
What if you need to write a regular expression that matches a CSV-parsing regular expression in a CSV file?
It was close. I managed to convince the guy that suggested it to not wake up the Codechtulhu though.
-
-
Unless you have to put regular expressions in your CSV!
... don't ask.
I should probably test to see if vertical pipes in the data does break the format, as I suspect it does.
-
How about we encode data as varint + bytes, where the varint is the number of bytes and the bytes are the contents of the field.
Then we can have each record be a field in a meta-record, and if needed, we can have meta-meta records as well.
-
-
0774657374696e67