Curl inconsistently mutates uploaded files



  • Situation: I have a mixed binary/text file (a self-extracting archive style installer with a text script and then a compressed binary tarball appended). I need to POST it to a specific server endpoint, which checksums it and stores it for a client to download automatically. The client checks the checksum against what it was told to expect.

    For some installers, doing

    curl -H "Content-Type: application/octet-stream" -X POST <server> --data-binary=@$filename
    

    works perfectly--everyone agrees that on the checksums at each stage, including checksumming it locally ahead of time.

    For other installers, doing the exact same command produces mangled files. Specifically, the checksum reported back to the caller on upload matches the pre-upload checksum, but the checksum on disk on the server and the checksum on download (manually or by the clients) is completely different and the file no longer uncompresses properly.

    The installers are being made by the same versions of the packaging script, using the same command execution. The underlying binaries that are being packaged differ, but I can't figure out any reason why that should matter once it's compressed and packaged.

    Help?


  • Banned

    @Benjamin-Hall when you do binary diff, is it just about newline conversion or something else?



  • @Gustav said in Curl inconsistently mutates uploaded files:

    @Benjamin-Hall when you do binary diff, is it just about newline conversion or something else?

    I must admit that my binary diff skills are rusty. I know that it's changing quite a lot, but I'm not sure what I'm looking for.

    Any suggestions on how I could check that?



  • The other thought is that the transport is munging the file by smuggling in the HTTP/1.1 chunked encoding boundaries - binary comparison as above is the place to start.



  • Doing cmp -l <original> <downloaded> produces a huge list where every byte above a certain value (probably the start of the binary) is different and it does not appear to be just a simple shift. Again, I'm not at all sure what I'm looking for.

    The text portions, both under diff and under visual inspection, appear to be identical. The differences only start in the binary portion.



  • @Benjamin-Hall

    Have you tried tarring (or any other archive) the entire project and making the entire thing binary?



  • It sounds like the text/binary confabulation might be being quietly “converted” from ISO-8859-something to UTF-8. Which sucks.



  • We've narrowed it down to what seems to be the server code that does the checksum and stores it using the file apis in an unsafe manner (not awaiting the file.write calls), producing what seem to be interleaved chunks in some conditions.

    We ruled out any kind of text conversion stuff--the different bytes are in the middle of the binary part only, are consecutive, and don't follow any particular pattern.



  • @Benjamin-Hall said in Curl inconsistently mutates uploaded files:

    We've narrowed it down to what seems to be the server code that does the checksum and stores it using the file apis in an unsafe manner (not awaiting the file.write calls), producing what seem to be interleaved chunks in some conditions.

    We ruled out any kind of text conversion stuff--the different bytes are in the middle of the binary part only, are consecutive, and don't follow any particular pattern.

    And yeah, confirmed. We uploaded a similar-sized plain text file. It's interleaving chunks in the middle once it gets to a certain size (probably what doesn't fit in a single syscall buffer or whatever).


  • Notification Spam Recipient

    I remember having this issue on a client side upload/download tool using the .Net httpClient library. Fun stuff.


  • Trolleybus Mechanic

    @Benjamin-Hall Did you end up adding a sleep before the hasher ran to do a quick verification?


Log in to reply