WTF Bites
-
@Bulb They don't even have a proper backwards-compatibility excuse, since they broke hard with the switch to xlsx, and to varying degrees before that.
-
My software writes out numbers in C locale, like any sane computer-parsable file should.
I really first ran into locales as a student many years ago. Lab had us "write" some software in labview to orchestrate measurements across a few devices. So, I sketched out the core of the program on one of the workstations, tested it, and moved stuff to the under-powered lab computer. Connected all different devices. Hit run.
Result: cacophony.
The poor devices that we tried to talk to had exactly one way of reporting errors: via their built-in buzzer. And they reported each and every error with the buzzer -- i.e., each time they got to a number they couldn't parse. Labview had decided to format all numbers with the current locale of the system when generating the text-based commands it would send to the devices. Yay.
-
-
My first suspicion would always be on transaction overhead. Some databases work best with as many inserts in one transaction as possible while others have some optimal size of batch, but inserting under individual transactions (or autocommit) is pretty bad everywhere.
Don't y'all use clusters? I've had to advise developers how to avoid large transactions recently because Galera hates them, to the point of not just locking and aborting completely unrelated transactions on different databases but even crashing the entire cluster.
-
@sebastian-galczynski said in WTF Bites:
problematic cards also pull too much current through the circuit at the other end
That's cause these are special medical grade cards. They probably forgot to solder the batteries.
Funny, I lived almost walking distance from that company a few years ago. They don't seem to exist any more but their homepage sporting a JPG screenshot of a Word document hosted by a Filipino dude¹ with a German phone number who individually screenshots the same thing for each of the dozens of "under construction" sites on his server and isn't quite sure how to spell his own "business" domain is a perfect match for this contraption.¹ Not exactly walking distance from my place 20 years ago but same island.
-
Google's glorious preservation of the knowledge of the world
Yeah, it's a difficult text to OCR. 180 years old, with decorative type, gothic, dirty pages, library stamps and all. But you know it's not going well when the text starts wlth dg cribimg høw lt w4s carcfully scannod:
This is a digital copy of a book that was prcscrvod for gcncrations on library shclvcs bcforc it was carcfully scannod by Google as pari of a projcct
That's the easy part, the part that was obviously written by Google, then printed out, put on a wooden table and photographed back in to be 0CRd.
So behold the text:
9cro«tt«ii tm fiatmßvAt tin^ <^dl)ett. * VAA;1C.'-..n ! t i i V TM i t IT "^ 1 ö V^
To be fair, it doesn't go on quite as bad. Most of the book is recognizable as something vaguely resembling German if not comprehensible text:
3n biefcm ^tinji)) ip bie 5(u§fü^ning jebct gto§ättigen 3bee moglid^. 3)le 9lu8tottung i^ert^eetenbet «Jltanf^eiten , fd&ablic^et 3nfeften, bie 5SetebIung, ittaftigung unb Setfc^onetung beö menf(f)fid^en Äor^jerS. ®ie 23eri5;fitung öon 3Äange!, UeBetfd^ttjem* mung unb einet SRenge anbetet Uebel ijl nut allein in bet @e* meiufd^aft mpglic^. 6d^on batum, tt?eit alle befannte S^Jtad^en gtofe lln\)oUfommen^eiten an ftd^ ^aUn, ifl eS' not^injenbig, tint ganj mm, fd^one, too^lfUngenbe, öoUfommcne ©^tad^e gu etjtn* ben. Unb U>enn bie ßtfinbung betfelben mSglid^ ift, h?arum follte bie 5lnu>enbung betfelben nic^t mogfid^ fei^n; JD^ne baö $tinjip bet ©emeinfd^aft ifl biefe freilid^ nid&t moglid^. HfUinl bie ©egtiffe. ©^tad^en, (^tenjen unb QSatetlanb finb bet ' ?Kenfd[;^eit fo ttjejiig not^h?enbig, al3 alle befte^enben teligtefen JDogmen. 9ltle biefe Segtiffe finb ^etjäi;tte Uebetll^fmingen, beten SWad^ti^eil immet fü^Ibatet n>itb je länget fle Befielen.
There are 288 pages of this shit. Obviously Google has completely cut out any humans from the library-to-internet pipeline
-
@sebastian-galczynski said in WTF Bites:
Edit: We now have Inner JSON inside CSV. Turns out it can't be edited with Excel, because wrong quotes.
Oh yeah, you're supposed to double-double-quote JSON strings, with escaping.
Guess who found out PHP doesn't do that for you with the default CSV functions?
-
Obviously Google has completely cut out any humans from the library-to-internet pipeline
Doesn't seem to have an AI cleanup step either; that's just OCR vomit.
-
screenshot of a Word document
I've seen a case of a metal nameplate on a certain device (produced by a certain company I'm intermittently affiliated with) engraved from such a screenshot. The name of the company was underlined with a thin wavy line, because it wasn't in the dictionary.
-
@Tsaukpaetra fputcsv predates RFC4180 but if memory serves, you should just be able to change the escape character to " and that would do the job.
It is so rare for me to have to write correct CSV though, virtually everyone else I’ve worked with fucks up if you give them proper CSV files.
-
It is so rare for me to have to write correct CSV though, virtually everyone else I’ve worked with fucks up if you give them proper CSV files.
There are two reasons for writing CSV:
- Moving slabs of numbers about, often within a locale, or even a single computer.
- Getting data into Excel. Preferably without having to learn to write XLSX directly.
-
@dkf oh, I have done a great many CSV pieces in my time. I just rarely write proper CSV because most of what I do is thr eldritch abomination known as Excel CSV.
Kick the file off with a BOM for good measure (tells Excel you want to use UTF-8 rather than whatever encoding), followed by judicious use of
="value"
items as cells because Excel. Often with careful replacements of CHR(34) and appending things if the data itself has quotes in.
-
@sebastian-galczynski said in WTF Bites:
Ok, I loaded the floats. They're geographical coordinates. Something is wrong, looks like they moved from Poland to Saudi Arabia.
Mine always end up off the shore near Murmansk. I am not sure which is better...
Edit: Or am I special for assuming that " Coord. X " is longitude.
Oh my sweet summer child. You are not special, every one of us made this mistake, when confronted with the topic of geographic coordinates for the first time.
Short version: there is no such single thing as "longitude". There are many, many, many possible coordinate systems and you need to know exactly which one is used.
Long version: you're probably thinking of WGS84, also known as EPSG:4326 ; this is the coordinate system used by GPS and is almost just latitude/longitude.
But when you get Coord.X, it might be actually a EPSG:3857 - a Mercator projection coordinates used by Google Map (the stack of is getting deeper).It is, however, quite common to use "traditional" projections used by specific country, usually designed at the beginning of 20th century. These projection are usually planes intersecting the earth at a specific angle and height so the country in question looks "the best". For bonus points, some countries created separate projections for civilian (cadaster) and military use. And of course, many of these countries don't exist anymore and new ones do exist, so in Poland you might get the Prussian one or the Austrian one (and probably more, but I cannot find any good sources at this moment).
-
@Kamil-Podlesak said in WTF Bites:
Mine always end up off the shore near Murmansk. I am not sure which is better...
That would mean that you're somewhere between Kabul and Islamabad. Not better I think.
@Kamil-Podlesak said in WTF Bites:
Short version: there is no such single thing as "longitude". There are many, many, many possible coordinate systems and you need to know exactly which one is used.
The problem was much simpler. They switched the coordinates. X is lattitude, Y is longitude ;)
-
@sebastian-galczynski You could say they took some latitude with the coordinates.
-
@sebastian-galczynski said in WTF Bites:
@Kamil-Podlesak said in WTF Bites:
Short version: there is no such single thing as "longitude". There are many, many, many possible coordinate systems and you need to know exactly which one is used.
The problem was much simpler. They switched the coordinates. X is lattitude, Y is longitude ;)
So it's WGS84 (aka "GPS coordinates")? You've got away lightly. This time.
-
@Kamil-Podlesak said in WTF Bites:
These projection are usually planes intersecting the earth at a specific angle and height so the country in question looks "the best".
Then you get the ones that have a time component in their specification.
-
@Kamil-Podlesak said in WTF Bites:
Mine always end up off the shore near Murmansk. I am not sure which is better...
Back when I dealt with map coordinates, the Atlantic somewhere off of Gabon was favourite as bad data was inevitably putting things at 0,0. At least in that case, there were no worries about swapping coordinates.
-
@dkf Ah. Null Island.
-
Back when I dealt with map coordinates, the Atlantic somewhere off of Gabon was favourite as bad data was inevitably putting things at 0,0. At least in that case, there were no worries about swapping coordinates.
Sadly Google Maps started removing photos from that location. It was a very interesting mix.
-
Everyday I learn new things. For example today I learned that the built-in http client in node is actually two clients: one for HTTP, one for HTTPS. The latter can't do unencrypted HTTP, so if I want to download something from an url which may be either, I need to check the proto with String.substring() and then instantiate one of the clients (which have different type signatures, so Typescript immediately shows red underlines). Trurly divine intellect.
-
@sebastian-galczynski said in WTF Bites:
Everyday I learn new things. For example today I learned that the built-in http client in node is actually two clients: one for HTTP, one for HTTPS. The latter can't do unencrypted HTTP, so if I want to download something from an url which may be either, I need to check the proto with String.substring() and then instantiate one of the clients (which have different type signatures, so Typescript immediately shows red underlines). Trurly divine intellect.
Just use
node-fetch
, saves a lot of headaches.
-
@Kamil-Podlesak said in WTF Bites:
@sebastian-galczynski said in WTF Bites:
Everyday I learn new things. For example today I learned that the built-in http client in node is actually two clients: one for HTTP, one for HTTPS. The latter can't do unencrypted HTTP, so if I want to download something from an url which may be either, I need to check the proto with String.substring() and then instantiate one of the clients (which have different type signatures, so Typescript immediately shows red underlines). Trurly divine intellect.
Just use
node-fetch
, saves a lot of headaches.But does it come with
leftpad
-
@Benjamin-Hall said in WTF Bites:
Insert "I found your problem right there" meme
Yeah, wasn't a fan of it back then either.
-
@Benjamin-Hall said in WTF Bites:
Insert "I found your problem right there" meme
Yeah, wasn't a fan of it back then either.
It was my first demonstration of literal (if digital) spaghetti code. Little wires everywhere.
-
This post is deleted!
-
@Kamil-Podlesak said in WTF Bites:
Just use node-fetch, saves a lot of headaches.
I somehow incorrectly thought that node-fetch can't do streams and will use up memory.
-
@Benjamin-Hall said in WTF Bites:
It was my first demonstration of literal (if digital) spaghetti code. Little wires everywhere.
Same. Even after desperately trying to clean things up by packing the little wires into bigger wires ("structs") and putting things into squares and stuff, it was still spaghetting all over the place
-
@Tsaukpaetra fputcsv predates RFC4180 but if memory serves, you should just be able to change the escape character to " and that would do the job.
It is so rare for me to have to write correct CSV though, virtually everyone else I’ve worked with fucks up if you give them proper CSV files.
Yeah. I've unilaterally (because I'm the only one developing this) decided that JSON will be the exchange format. For fun.
-
@Tsaukpaetra said in WTF Bites:
JSON will be the exchange format. For fun.
Fair, but what about the other stuff?
-
@Tsaukpaetra said in WTF Bites:
JSON will be the exchange format. For fun.
Fair, but what about the other stuff?
What other stuff? All hail the JSON!
-
Yeah, it's a difficult text to OCR.
The problems already begin with the scan. It was reduced to black&white with apparently simple threshold and that just discarded a lot of information that would be available in full color scan.
It might have even made it worse for the OCR, because there are a lot of black blots that were actually just very slightly darker than the surrounding and the OCR should be ignoring them, and some missing bits that were just a little too light and the OCR could still make them out if it had at least full grayscale.
This is a digital copy of a book that was prcscrvod for gcncrations on library shclvcs bcforc it was carcfully scannod by Google as pari of a projcct
That's the easy part, the part that was obviously written by Google, then printed out, put on a wooden table and photographed back in to be 0CRd.
Why do they do this to themselves? The preamble, at least on my screen, is so badly kerned that the letters almost overlap each other. It's no wonder the OCR has trouble with it.
-
My software writes out numbers in C locale, like any sane computer-parsable file should.
I really first ran into locales as a student many years ago. Lab had us "write" some software in labview to orchestrate measurements across a few devices. So, I sketched out the core of the program on one of the workstations, tested it, and moved stuff to the under-powered lab computer. Connected all different devices. Hit run.
Result: cacophony.
Modern frameworks tend to do locale-specific processing only upon explicit request. But the way it was done back then in the standard C library is rather unfortunate—you have a global (threads? what? no, sir, we didn't have that back then) switch that switches all the standard formatting and parsing functions to use specific locale (with option to select neutral locale or environment default locale).
Even C++ library does a bit better, having a global default you can set, but allowing you to set it per-stream. And eventually the C library grew versions of the functions that take a locale as argument. But not all systems got those updates, so e.g. the GNU standard C++ library jumps through some hoops and has some limitations on those.
Of late, the most moment with locales was Xerces. We were doing some major Xml handling refactoring, which involved switching parsers to Xerces-C. Since Xml can have encoding declared, Xerces uses some transcoders internally to handle the options. But it does not have either runtime or compile-time option to tell it that you always want UTF-8 internally—even though that's what many modern programs do. Instead it looks at the locale settings. Fortunately for us—because the device, not having any UI, does not have the locale data—it does not process it the intended way, but rather looks at whether the
LANG
environment variable has a.UTF-8
suffix. So I ended up concluding that a global init ofsetenv("LANG", "C.UTF-8");
is the right workaround. Even though initializing the locale subsystem in C library with that wouldn't actually work.
-
@Tsaukpaetra said in WTF Bites:
I've unilaterally (because I'm the only one developing this) decided that JSON will be the exchange format. For fun.
JSON is no fun. If you make a typo, it can't be parsed at all and someone has to fix it. CSV can achieve miracles.
For example, a few years ago I tried to fix a weird bug - some customers had their accounts randomly deactivated, deleted, or something. It turns out that the customer database was synchronized with another subsidiary (a very, very big thing with millions of customers) by means of dropping a CSV in some directory, from which it was fetched.
Apparently at some point long ago there was a problem with the synchronization crashing, because some of the addresses were too long and didn't fit in the database column. What did the other subsidiary do? They loaded the CSV again (in a wrong way) and truncated the column, losing the closing quote. Now, the loader (written in PHP, don't know by whom) didn't check if the data was 'rectangular', it just iterated over fields and skipped to the next row when the current row was "finished". So when such a truncated row happened, it was treating everything to the next quote as one field. Then it tried to push wrong CSV columns into wrong database columns, and sometimes succeeded, resulting in the customers appearing thousands of times with nonsense names like 'Mr Warsaw Mazovia" etc.
Fixing this at the source proved impossible (that was a big company, with many managers), and skipping data wouldn't satisfy the bosses, so I eventually applied some regex-based heuristics to fix the quotes, which worked 99.9% on the given data. Then I got the f- out from there fast.
-
@sebastian-galczynski said in WTF Bites:
CSV can achieve miracles.
I didn't know that Codethulhu dealt in miracles.
-
@sebastian-galczynski sometimes succeeding is bad. It should either predictably and consistently succeed, or it should predictably and consistently fail.
-
¹ Not exactly walking distance from my place 20 years ago but same island.
The Filipino dude is probably just a hosting company. The producer of the the tablet was based in Prutting, Bavaria.
-
it should predictably and consistently fail.
And, most importantly, early and loudly.
-
@sebastian-galczynski hell yes. Ideally before any data changing has occurred.
-
@sebastian-galczynski said in WTF Bites:
early and loudly.
@Override public String getMessage() { throw this; }
-
@HardwareGeek said in WTF Bites:
No, different job. This is for a "Computer System Validation" consultant
"Computer System Validation" can be abbreviated as
CSV
:there's-your-problem.jpg:
-
@sebastian-galczynski said in WTF Bites:
you make a typo
Who the shit is manually typing out your CSV files?!?!
-
@Tsaukpaetra said in WTF Bites:
@sebastian-galczynski said in WTF Bites:
you make a typo
Who the shit is manually typing out your CSV files?!?!
Probably an “AI” from Southeast Asia.
-
Linux firewall comes with a bug:
-
Wanna keep your files secure on a thumb drive?
What about the "Verbatim Keypad Secure"?
OK, then read
https://www.kiratas.com/2022/06/08/verbatim-encrypting-usb-stick-insecure-expert-reveals-vulnerabilities-2/
Of course, Verbatim just ignores these facts.
-
@BernieTheBernie Another good reason to RIIR…
-
@Bulb the Rust compiler proves cryptosystems now?
-
@Tsaukpaetra said in WTF Bites:
Who the shit is manually typing out your CSV files?!?!
I have no idea what produced those broken files. But now I'm sometimes typing JSON myself to test some functionality, because the
divine intellectbootcamp grads didn't finish some of the more complex forms, but after some persuassion they at least added a textarea connected to JSON.parse in those places.
-
@sebastian-galczynski I have to manually type JSON to do our database changelogs (that get translated into SQL via liquibase). And it's a particularly obnoxious, super-verbose format. And the parser is insanely picky, including rejecting the file if there is the tab character anywhere. It even has an error message to use spaces instead.
-
@sebastian-galczynski said in WTF Bites:
JSON is no fun. If you make a typo, it can't be parsed at all and someone has to fix it. CSV can achieve miracles.
JSON streaming parsers do exist.