WTF Bites


  • Java Dev

    @Bulb They don't even have a proper backwards-compatibility excuse, since they broke hard with the switch to xlsx, and to varying degrees before that.



  • @topspin said in WTF Bites:

    My software writes out numbers in C locale, like any sane computer-parsable file should.

    I really first ran into locales as a student many years ago. Lab had us "write" some software in labview to orchestrate measurements across a few devices. So, I sketched out the core of the program on one of the workstations, tested it, and moved stuff to the under-powered lab computer. Connected all different devices. Hit run.

    Result: cacophony.

    The poor devices that we tried to talk to had exactly one way of reporting errors: via their built-in buzzer. And they reported each and every error with the buzzer -- i.e., each time they got to a number they couldn't parse. Labview had decided to format all numbers with the current locale of the system when generating the text-based commands it would send to the devices. Yay.



  • @cvi said in WTF Bites:

    labview

    :trwtf:

    Insert "I found your problem right there" meme


  • Considered Harmful

    @Bulb said in WTF Bites:

    My first suspicion would always be on transaction overhead. Some databases work best with as many inserts in one transaction as possible while others have some optimal size of batch, but inserting under individual transactions (or autocommit) is pretty bad everywhere.

    Don't y'all use clusters? I've had to advise developers how to avoid large transactions recently because Galera hates them, to the point of not just locking and aborting completely unrelated transactions on different databases but even crashing the entire cluster.


  • Considered Harmful

    @sebastian-galczynski said in WTF Bites:

    @Rhywden said in WTF Bites:

    problematic cards also pull too much current through the circuit at the other end

    That's cause these are special medical grade cards. They probably forgot to solder the batteries.

    :vomit:
    Funny, I lived almost walking distance from that company a few years ago. They don't seem to exist any more but their homepage sporting a JPG screenshot of a Word document hosted by a Filipino dude¹ with a German phone number who individually screenshots the same thing for each of the dozens of "under construction" sites on his server and isn't quite sure how to spell his own "business" domain is a perfect match for this contraption.

    ¹ 🤨 Not exactly walking distance from my place 20 years ago but same island.


  • Considered Harmful

    Google's glorious preservation of the knowledge of the world

    Yeah, it's a difficult text to OCR. 180 years old, with decorative type, gothic, dirty pages, library stamps and all. But you know it's not going well when the text starts wlth dg cribimg høw lt w4s carcfully scannod:

    This is a digital copy of a book that was prcscrvod for gcncrations on library shclvcs bcforc it was carcfully scannod by Google as pari of a projcct

    That's the easy part, the part that was obviously written by Google, then printed out, put on a wooden table and photographed back in to be 0CRd.

    So behold the text:

    9cro«tt«ii 
    
    
    
    tm 
    
    
    
    fiatmßvAt tin^ <^dl)ett. 
    
    
    
    * 
    
    
    
    VAA;1C.'-..n 
    
    
    
    ! 
    
    
    
    t 
    
    i 
    
    
    
    i V 
    
    
    
    TM i t IT "^ 1 ö V^ 
    

    To be fair, it doesn't go on quite as bad. Most of the book is recognizable as something vaguely resembling German if not comprehensible text:

    3n biefcm ^tinji)) ip bie 5(u§fü^ning jebct gto§ättigen 3bee 
    moglid^. 3)le 9lu8tottung i^ert^eetenbet «Jltanf^eiten , fd&ablic^et 
    3nfeften, bie 5SetebIung, ittaftigung unb Setfc^onetung beö 
    menf(f)fid^en Äor^jerS. ®ie 23eri5;fitung öon 3Äange!, UeBetfd^ttjem* 
    mung unb einet SRenge anbetet Uebel ijl nut allein in bet @e* 
    meiufd^aft mpglic^. 6d^on batum, tt?eit alle befannte S^Jtad^en 
    gtofe lln\)oUfommen^eiten an ftd^ ^aUn, ifl eS' not^injenbig, tint 
    ganj mm, fd^one, too^lfUngenbe, öoUfommcne ©^tad^e gu etjtn* 
    ben. Unb U>enn bie ßtfinbung betfelben mSglid^ ift, h?arum follte 
    bie 5lnu>enbung betfelben nic^t mogfid^ fei^n; JD^ne baö $tinjip 
    bet ©emeinfd^aft ifl biefe freilid^ nid&t moglid^. 
    HfUinl bie ©egtiffe. ©^tad^en, (^tenjen unb QSatetlanb finb bet 
    ' ?Kenfd[;^eit fo ttjejiig not^h?enbig, al3 alle befte^enben teligtefen 
    JDogmen. 9ltle biefe Segtiffe finb ^etjäi;tte Uebetll^fmingen, beten 
    SWad^ti^eil immet fü^Ibatet n>itb je länget fle Befielen. 
    

    There are 288 pages of this shit. Obviously Google has completely cut out any humans from the library-to-internet pipeline :facepalm:


  • Notification Spam Recipient

    @sebastian-galczynski said in WTF Bites:

    Edit: We now have Inner JSON inside CSV. Turns out it can't be edited with Excel, because wrong quotes.

    Oh yeah, you're supposed to double-double-quote JSON strings, with escaping.

    Guess who found out PHP doesn't do that for you with the default CSV functions?


  • Discourse touched me in a no-no place

    @LaoC said in WTF Bites:

    Obviously Google has completely cut out any humans from the library-to-internet pipeline :facepalm:

    Doesn't seem to have an AI cleanup step either; that's just OCR vomit.


  • Trolleybus Mechanic

    @LaoC said in WTF Bites:

    screenshot of a Word document

    I've seen a case of a metal nameplate on a certain device (produced by a certain company I'm intermittently affiliated with) engraved from such a screenshot. The name of the company was underlined with a thin wavy line, because it wasn't in the dictionary.



  • @Tsaukpaetra fputcsv predates RFC4180 but if memory serves, you should just be able to change the escape character to " and that would do the job.

    It is so rare for me to have to write correct CSV though, virtually everyone else I’ve worked with fucks up if you give them proper CSV files.


  • Discourse touched me in a no-no place

    @Arantor said in WTF Bites:

    It is so rare for me to have to write correct CSV though, virtually everyone else I’ve worked with fucks up if you give them proper CSV files.

    There are two reasons for writing CSV:

    1. Moving slabs of numbers about, often within a locale, or even a single computer.
    2. Getting data into Excel. Preferably without having to learn to write XLSX directly.


  • @dkf oh, I have done a great many CSV pieces in my time. I just rarely write proper CSV because most of what I do is thr eldritch abomination known as Excel CSV.

    Kick the file off with a BOM for good measure (tells Excel you want to use UTF-8 rather than whatever encoding), followed by judicious use of ="value" items as cells because Excel. Often with careful replacements of CHR(34) and appending things if the data itself has quotes in.



  • @sebastian-galczynski said in WTF Bites:

    Ok, I loaded the floats. They're geographical coordinates. Something is wrong, looks like they moved from Poland to Saudi Arabia.

    Mine always end up off the shore near Murmansk. I am not sure which is better...

    Edit: Or am I special for assuming that " Coord. X " is longitude.

    Oh my sweet summer child. You are not special, every one of us made this mistake, when confronted with the topic of geographic coordinates for the first time.

    Short version: there is no such single thing as "longitude". There are many, many, many possible coordinate systems and you need to know exactly which one is used.

    Long version: you're probably thinking of WGS84, also known as EPSG:4326 ; this is the coordinate system used by GPS and is almost just latitude/longitude.
    But when you get Coord.X, it might be actually a EPSG:3857 - a Mercator projection coordinates used by Google Map (the stack of :wtf: is getting deeper).

    It is, however, quite common to use "traditional" projections used by specific country, usually designed at the beginning of 20th century. These projection are usually planes intersecting the earth at a specific angle and height so the country in question looks "the best". For bonus points, some countries created separate projections for civilian (cadaster) and military use. And of course, many of these countries don't exist anymore and new ones do exist, so in Poland you might get the Prussian one or the Austrian one (and probably more, but I cannot find any good sources at this moment).


  • Trolleybus Mechanic

    @Kamil-Podlesak said in WTF Bites:

    Mine always end up off the shore near Murmansk. I am not sure which is better...

    That would mean that you're somewhere between Kabul and Islamabad. Not better I think.

    @Kamil-Podlesak said in WTF Bites:

    Short version: there is no such single thing as "longitude". There are many, many, many possible coordinate systems and you need to know exactly which one is used.

    The problem was much simpler. They switched the coordinates. X is lattitude, Y is longitude ;)



  • @sebastian-galczynski You could say they took some latitude with the coordinates.



  • @sebastian-galczynski said in WTF Bites:

    @Kamil-Podlesak said in WTF Bites:

    Short version: there is no such single thing as "longitude". There are many, many, many possible coordinate systems and you need to know exactly which one is used.

    The problem was much simpler. They switched the coordinates. X is lattitude, Y is longitude ;)

    So it's WGS84 (aka "GPS coordinates")? You've got away lightly. This time.



  • @Kamil-Podlesak said in WTF Bites:

    These projection are usually planes intersecting the earth at a specific angle and height so the country in question looks "the best".

    Then you get the ones that have a time component in their specification.


  • Discourse touched me in a no-no place

    @Kamil-Podlesak said in WTF Bites:

    Mine always end up off the shore near Murmansk. I am not sure which is better...

    Back when I dealt with map coordinates, the Atlantic somewhere off of Gabon was favourite as bad data was inevitably putting things at 0,0. At least in that case, there were no worries about swapping coordinates.


  • Java Dev

    @dkf Ah. Null Island.


  • Trolleybus Mechanic

    @dkf said in WTF Bites:

    Back when I dealt with map coordinates, the Atlantic somewhere off of Gabon was favourite as bad data was inevitably putting things at 0,0. At least in that case, there were no worries about swapping coordinates.

    Sadly Google Maps started removing photos from that location. It was a very interesting mix.


  • Trolleybus Mechanic

    Everyday I learn new things. For example today I learned that the built-in http client in node is actually two clients: one for HTTP, one for HTTPS. The latter can't do unencrypted HTTP, so if I want to download something from an url which may be either, I need to check the proto with String.substring() and then instantiate one of the clients (which have different type signatures, so Typescript immediately shows red underlines). Trurly divine intellect.



  • @sebastian-galczynski said in WTF Bites:

    Everyday I learn new things. For example today I learned that the built-in http client in node is actually two clients: one for HTTP, one for HTTPS. The latter can't do unencrypted HTTP, so if I want to download something from an url which may be either, I need to check the proto with String.substring() and then instantiate one of the clients (which have different type signatures, so Typescript immediately shows red underlines). Trurly divine intellect.

    Just use node-fetch, saves a lot of headaches.


  • I survived the hour long Uno hand

    @Kamil-Podlesak said in WTF Bites:

    @sebastian-galczynski said in WTF Bites:

    Everyday I learn new things. For example today I learned that the built-in http client in node is actually two clients: one for HTTP, one for HTTPS. The latter can't do unencrypted HTTP, so if I want to download something from an url which may be either, I need to check the proto with String.substring() and then instantiate one of the clients (which have different type signatures, so Typescript immediately shows red underlines). Trurly divine intellect.

    Just use node-fetch, saves a lot of headaches.

    But does it come with leftpad :thonking:



  • @Benjamin-Hall said in WTF Bites:

    :trwtf:

    Insert "I found your problem right there" meme

    Yeah, wasn't a fan of it back then either.



  • @cvi said in WTF Bites:

    @Benjamin-Hall said in WTF Bites:

    :trwtf:

    Insert "I found your problem right there" meme

    Yeah, wasn't a fan of it back then either.

    It was my first demonstration of literal (if digital) spaghetti code. Little wires everywhere.


  • Trolleybus Mechanic

    This post is deleted!

  • Trolleybus Mechanic

    @Kamil-Podlesak said in WTF Bites:

    Just use node-fetch, saves a lot of headaches.

    I somehow incorrectly thought that node-fetch can't do streams and will use up memory.



  • @Benjamin-Hall said in WTF Bites:

    It was my first demonstration of literal (if digital) spaghetti code. Little wires everywhere.

    Same. Even after desperately trying to clean things up by packing the little wires into bigger wires ("structs") and putting things into squares and stuff, it was still spaghetting all over the place


  • Notification Spam Recipient

    @Arantor said in WTF Bites:

    @Tsaukpaetra fputcsv predates RFC4180 but if memory serves, you should just be able to change the escape character to " and that would do the job.

    It is so rare for me to have to write correct CSV though, virtually everyone else I’ve worked with fucks up if you give them proper CSV files.

    Yeah. I've unilaterally (because I'm the only one developing this) decided that JSON will be the exchange format. For fun.


  • Considered Harmful

    @Tsaukpaetra said in WTF Bites:

    JSON will be the exchange format. For fun.

    Fair, but what about the other stuff?


  • Notification Spam Recipient

    @Gribnit said in WTF Bites:

    @Tsaukpaetra said in WTF Bites:

    JSON will be the exchange format. For fun.

    Fair, but what about the other stuff?

    What other stuff? All hail the JSON! 🎣



  • @LaoC said in WTF Bites:

    Yeah, it's a difficult text to OCR.

    The problems already begin with the scan. It was reduced to black&white with apparently simple threshold and that just discarded a lot of information that would be available in full color scan.

    It might have even made it worse for the OCR, because there are a lot of black blots that were actually just very slightly darker than the surrounding and the OCR should be ignoring them, and some missing bits that were just a little too light and the OCR could still make them out if it had at least full grayscale.

    This is a digital copy of a book that was prcscrvod for gcncrations on library shclvcs bcforc it was carcfully scannod by Google as pari of a projcct

    That's the easy part, the part that was obviously written by Google, then printed out, put on a wooden table and photographed back in to be 0CRd.

    Why do they do this to themselves? The preamble, at least on my screen, is so badly kerned that the letters almost overlap each other. It's no wonder the OCR has trouble with it.



  • @cvi said in WTF Bites:

    @topspin said in WTF Bites:

    My software writes out numbers in C locale, like any sane computer-parsable file should.

    I really first ran into locales as a student many years ago. Lab had us "write" some software in labview to orchestrate measurements across a few devices. So, I sketched out the core of the program on one of the workstations, tested it, and moved stuff to the under-powered lab computer. Connected all different devices. Hit run.

    Result: cacophony.

    Modern frameworks tend to do locale-specific processing only upon explicit request. But the way it was done back then in the standard C library is rather unfortunate—you have a global (threads? what? no, sir, we didn't have that back then) switch that switches all the standard formatting and parsing functions to use specific locale (with option to select neutral locale or environment default locale).

    Even C++ library does a bit better, having a global default you can set, but allowing you to set it per-stream. And eventually the C library grew versions of the functions that take a locale as argument. But not all systems got those updates, so e.g. the GNU standard C++ library jumps through some hoops and has some limitations on those.


    Of late, the most :wtf: moment with locales was Xerces. We were doing some major Xml handling refactoring, which involved switching parsers to Xerces-C. Since Xml can have encoding declared, Xerces uses some transcoders internally to handle the options. But it does not have either runtime or compile-time option to tell it that you always want UTF-8 internally—even though that's what many modern programs do. Instead it looks at the locale settings. Fortunately for us—because the device, not having any UI, does not have the locale data—it does not process it the intended way, but rather looks at whether the LANG environment variable has a .UTF-8 suffix. So I ended up concluding that a global init of setenv("LANG", "C.UTF-8"); is the right workaround. Even though initializing the locale subsystem in C library with that wouldn't actually work.


  • Trolleybus Mechanic

    @Tsaukpaetra said in WTF Bites:

    I've unilaterally (because I'm the only one developing this) decided that JSON will be the exchange format. For fun.

    JSON is no fun. If you make a typo, it can't be parsed at all and someone has to fix it. CSV can achieve miracles.

    For example, a few years ago I tried to fix a weird bug - some customers had their accounts randomly deactivated, deleted, or something. It turns out that the customer database was synchronized with another subsidiary (a very, very big thing with millions of customers) by means of dropping a CSV in some directory, from which it was fetched.
    Apparently at some point long ago there was a problem with the synchronization crashing, because some of the addresses were too long and didn't fit in the database column. What did the other subsidiary do? They loaded the CSV again (in a wrong way) and truncated the column, losing the closing quote. Now, the loader (written in PHP, don't know by whom) didn't check if the data was 'rectangular', it just iterated over fields and skipped to the next row when the current row was "finished". So when such a truncated row happened, it was treating everything to the next quote as one field. Then it tried to push wrong CSV columns into wrong database columns, and sometimes succeeded, resulting in the customers appearing thousands of times with nonsense names like 'Mr Warsaw Mazovia" etc.
    Fixing this at the source proved impossible (that was a big company, with many managers), and skipping data wouldn't satisfy the bosses, so I eventually applied some regex-based heuristics to fix the quotes, which worked 99.9% on the given data. Then I got the f- out from there fast.


  • Discourse touched me in a no-no place

    @sebastian-galczynski said in WTF Bites:

    CSV can achieve miracles.

    I didn't know that Codethulhu dealt in miracles.



  • @sebastian-galczynski sometimes succeeding is bad. It should either predictably and consistently succeed, or it should predictably and consistently fail.


  • Trolleybus Mechanic

    @LaoC said in WTF Bites:

    ¹ Not exactly walking distance from my place 20 years ago but same island.

    The Filipino dude is probably just a hosting company. The producer of the the tablet was based in Prutting, Bavaria.


  • Trolleybus Mechanic

    @Arantor said in WTF Bites:

    it should predictably and consistently fail.

    And, most importantly, early and loudly.



  • @sebastian-galczynski hell yes. Ideally before any data changing has occurred.


  • Considered Harmful

    @sebastian-galczynski said in WTF Bites:

    early and loudly.

    @Override 
    public String getMessage() {
       throw this;
    }
    


  • @HardwareGeek said in WTF Bites:

    No, different job. This is for a "Computer System Validation" consultant

    "Computer System Validation" can be abbreviated as
    CSV
    :there's-your-problem.jpg:


  • Notification Spam Recipient

    @sebastian-galczynski said in WTF Bites:

    you make a typo

    Who the shit is manually typing out your CSV files?!?!


  • I survived the hour long Uno hand

    @Tsaukpaetra said in WTF Bites:

    @sebastian-galczynski said in WTF Bites:

    you make a typo

    Who the shit is manually typing out your CSV files?!?!

    Probably an “AI” from Southeast Asia.



  • Linux firewall comes with a bug:

    Linux is so secure.
    🐧



  • Wanna keep your files secure on a thumb drive?
    What about the "Verbatim Keypad Secure"?
    OK, then read
    https://www.kiratas.com/2022/06/08/verbatim-encrypting-usb-stick-insecure-expert-reveals-vulnerabilities-2/
    Of course, Verbatim just ignores these facts.
    :surprised-pikachu:



  • @BernieTheBernie Another good reason to RIIR


  • Considered Harmful

    @Bulb the Rust compiler proves cryptosystems now?


  • Trolleybus Mechanic

    @Tsaukpaetra said in WTF Bites:

    Who the shit is manually typing out your CSV files?!?!

    I have no idea what produced those broken files. But now I'm sometimes typing JSON myself to test some functionality, because the divine intellect bootcamp grads didn't finish some of the more complex forms, but after some persuassion they at least added a textarea connected to JSON.parse in those places.



  • @sebastian-galczynski I have to manually type JSON to do our database changelogs (that get translated into SQL via liquibase). And it's a particularly obnoxious, super-verbose format. And the parser is insanely picky, including rejecting the file if there is the tab character anywhere. It even has an error message to use spaces instead.



  • @sebastian-galczynski said in WTF Bites:

    JSON is no fun. If you make a typo, it can't be parsed at all and someone has to fix it. CSV can achieve miracles.

    JSON streaming parsers do exist.


Log in to reply