Obfuscation, yo

Onyx

str_getcsv takes a CSV and turns it into an array.

Sooo...

$csvArray = explode(",", $csvString);

?

Well, that was time well spent!

boomzilla

Does the function handle stuff like commas in quotes (which I presume explode doesn't)?

Arantor

Oh, my bad.

@boomzilla: yes. explode is for straight 'here's a string: a,b,c now make an array out of it with 3 elements' but CSV can be so much more complicated.

Onyx

@boomzilla said:

Does the function handle stuff like commas in quotes (which I presume explode doesn't)?

I was just taking the obligatory piss out of PHP. I assume the actual function does much more like you pointed out.

Filed under: I need to use more sarcasm tags

boomzilla

@Onyx said:

PHP. I assume the actual function does much more like you pointed out.

Aaaaaaaaaaaugh!

Maybe str_getcsv_escaped will.

Onyx

@boomzilla said:

str_getcsv_real_escaped

FTFY

boomzilla

Slow down there, cowboy.

Arantor

You know what's really funny?

The PHP devs pre-empted the fuck out of all of you for once. str_getcsv does actually handle escaped shit too! (And then you have the choice of doing further escaping, or not, as your application needs later)

Maciejasjmj

@Arantor said:

CSV can be so much more complicated.

Which is mostly caused by the fact that comma is the single most retarded character to use as a universal separator. It can show up both in textual data and numbers (and either as a decimal or thousands separator). It's only slightly less retarded than lowercase e.

And to think ASCII has had a unit separator for ages, but nobody gave enough of a fuck.

Filed under: me, I use ☺SV, or SV if Unicode is supported

boomzilla

@Maciejasjmj said:

but nobody gave enough of a fuck.

In my experience, you have to take fucks, 'cause people won't just give them out.

Filed Under: Rape Culture, Trigger Warning

Maciejasjmj

@boomzilla said:

In my experience, you have to take fucks

Aiming for a management position, are we?

Onyx

@Maciejasjmj said:

Which is mostly caused by the fact that comma is the single most retarded character to use as a universal separator. It can show up both in textual data and numbers (and either as a decimal or thousands separator). It's only slightly less retarded than lowercase e.

In one place, I use R.

Yes, really, blame stupid phones that won't let me enter any of the special characters that are allowed in SIP standard in their settings. And even if they do, it's inconsistent between manufacturers.

HardwareGeek

@Maciejasjmj said:

comma is the single most retarded character to use as a universal separator

Somebody here decided "SPLITCHAR" would make a good field separator.

dkf

@Maciejasjmj said:

And to think ASCII has had a unit separator for ages, but nobody gave enough of a fuck.

Try using a NUL (U+000000). Guaranteed to not be used for anything else in a CSV file!

Maciejasjmj

@dkf said:

Try using a NUL (U+000000). Guaranteed to ~~not be used for anything else in a CSV file!~~ fuck up your CSV reader library!

Arantor

Pick another character, rinse, repeat.

dkf

Not my fault if your lame language can't cope with an ordinary ASCII character!

Arantor

What if it can't cope with any of them? snigger

Weng

@Maciejasjmj said:

And to think ASCII has had a unit separator for ages, but nobody gave enough of a fuck.

I'm sitting here at work at 10PM on a Saturday, working on our ETL system (a sprawling custom SSIS based monstrosity).

Our clients are notoriously bad at doing CSV correctly (Or really, any delimiters).

You literally blew my mind there. Literally. I didn't believe you until I looked at my ASCII table and saw 1F sitting there.

I have now (in about 15 seconds of defining character constants) built in what I'm calling "ASCII Standard Delimited" support using 1E and 1F, and "ASCII Standard Delimited Derp Edition" which uses 1E and CRLF (or just LF). I will pass this along to our tech and sales goons and watch as they completely ignore this attempt at inducing sanity.

Seriously, I once had a customer deliver a "pipe delimited" file that, upon further investigation, had been built in memory. Memory that had been zero'd with the 0xDEADBEEF marker. And the "delimited" fields all had fixed widths. And the way they got data into them was by, apparently, zeroing the RAM, writing out the pipes and line feeds in the appropriate spots, and then copying tokenized, variable length strings, one token at a time, leaving out the whitespace, into that memory space... and never writing to the whitespaced sections., resulting in a file that looked like this:

someBvalueADBEEF|anotherFvalueEhereADBEEF

I've also seen a similar trick pulled off with fixed width files in memory that had been zero'd. Had to run a preprocessor over the file to remove NULL's to make anything read that one.

Yamikuronue

I used to work on ATMs, our messaging protocol used 1C, 1D, and 1E for various types of separators. So it's used in some places.

Arantor

oh, that reminds me actually.

The project that kicked this topic off does all kinds of weird shit.

For example, it's forked an earlier web-based doohickey, and there's still a ton of inline JavaScript, onclicks etc. Now, instead of rewriting this and burying it in .js files, the guy in question does some seriously odd things.

He buffers all the output from PHP, then scrapes through it for onclick and other events, and grabs it all out of the buffer, and builds an array out of it so that duplicate events can be avoided.

So what then happens is you get data-onclick="1 3" where 1 and 3 are the indices of this array of JavaScript things to be run on that element being clicked.

And in overhauling all of this, he decided that instead of trying to fight with mixing quoting styles, he actually converts quotes to 0x0F and 0x10 characters when strings need to be output 'with escaping for JavaScript'.

And he's convinced this is a good thing. I mean, sure, it works, but it's really not the best way to actually do any of this stuff...

dkf

@Weng said:

someBE₁₆valueBE₁₆EF₁₆DE₁₆AD₁₆BE₁₆EF₁₆|anotherDE₁₆valueBE₁₆hereEF₁₆DE₁₆AD₁₆BE₁₆EF₁₆

FTFY. And many commiserations; that's the worst I've seen in a while. Reminds me of dealing with sunspot data (except that was in EBCDIC fixed width records with no field or record separators…)

Maciejasjmj

@Weng said:

I have now (in about 15 seconds of defining character constants) built in what I'm calling "ASCII Standard Delimited" support using 1E and 1F, and "ASCII Standard Delimited Derp Edition" which uses 1E and CRLF (or just LF).

Technically, it should be 1F and CRLF (1F is a field separator).

@Weng said:

0xDEADBEEF

If your data was only low-ASCII, it's not so bad (replace with spaces and trim accordingly). If those bytes could appear in the data... well, you're pretty much fucked.

Filed under: pedantic dickweedery