Stolen by the byte faerie



  • I just spent about half an hour tracking down a very strange bug. I was porting my application to Windows for the first time (previously tested in UNIX only). My application extracts data from binary files, and displays it in a window, writes to a file, etc.

    For some reason, in Windows it kept crashing on one file. After some bug hunting, the file seemed to suddenly be missing a byte.

    In my hex editor, I saw this (addresses and some values changed, I don't remember them at the moment):

    2000 FA
    2001 FA
    2002 13
    2003 20
    2004 30
    ...

    But when I put some printfs into my program to output what it saw in those bytes, I got this:


    2000 FA

    2001 FA

    2002 20

    2003 30

    In my estimation, the sky was falling. How could a byte just disappear? All I was doing to get the data was a simple fread!

    I decided the best way to solve this mystery was to create a tiny secondary program. If it had the bug, I only had 10 lines to look through rather than 5000. If it didn't have the bug, I knew the issue wasn't in my file loading code.

    As I was coding it, I got to FILE* in = fopen("thefile.dat", "r

    And then I realized...wasn't there another flag that could be put in fopen under Windows? Something about binary files? I looked it up, and it seems that without that flag Windows will drop carriage returns from any files read in. And sure enough, when I looked it up, 13 is the ASCII value for a carriage return.

    -------------------------------

    tl;dr version: I forgot the binary flag in fread, and hair-tugging ensued



  • Oops, sorry, that "2002 13" should be "2002 0D"



  • @bobday said:


    it seems that without that flag Windows will drop carriage returns from any files read in. And sure enough, when I looked it up, 13 is the ASCII value for a carriage return.


    I put the b flag in when using binary files in unix - it doesn't do anything but it can save some hair pulling later...

    Windows new lines are represented by CR LF (0x0d 0x0a) so when reading a text file in Windows this gets converted to '\n' (0x0a). If it didn't do this you'd end up with lots of extra CR bytes when reading text files and be posting that on here as a WTF...

    Of course the real WTF is that Windows needs 2 bytes to represent a new line...



  • @gremlin said:

    Of course the real WTF is that Windows needs 2 bytes to represent a new line...

    IIRC that's because a carrage return on DOS was "move down one line" (0x0A) followed by "move to start of line" (0x0D). Saves one control code (==space in the original 127-character ASCII table) by wasting storage space.



  • This all dates back to teletype style terminals and similar "character only" printers.

    It could be useful to retern the printhead to the start of the line without rolling the paper up (called a carriage return because it was the same as the action of moving a tyewriter carriage back to to the left). This allowed you to overtype for things such as strikeout or underline. You could even do a good bold by printing the same thing several times with a carriage return between each.

    If you wanted to roll the paper up you used a line feed.

    If you wanter to do both you would do carriage return followed by line feed (or the other way around if you wanted it).

    When CRT terminals were used the carriage return (CR) command allowed you to reprint a line, so you could for example print a percentage done, do more processing, then do CR and a new percentage done.

    It was decided by some developers that there was littel or no point in a line feed (LF) that didn't do an implied CR, so LF alone was enough between lines of text.

    Other developers thought it should remain as it always had been.

    For a while there were a number of differen LF onle, CR only, LFCR and CRLF "standards" in use.

    These days there only seem to be two that get used - LF only in *nix and CRLF in Windoze.



  • @GettinSadda said:

    These days there only seem to be two that get used - LF only in *nix and CRLF in Windoze.
    IIRC Macs use CR only.



  • @hetas said:

    @GettinSadda said:
    These days there only seem to be two that get used - LF only in *nix and CRLF in Windoze.
    IIRC Macs use CR only.


    Classic Mac OS, yes.  Mac OS X uses LF, forward-slashes, case-sensitivity, and other conventions imposed by God to Moses in his covenant.

        -dZ.



  • Well that's a wtf (I mean : forgetting to open in binary mode). But also I must admit I had always thought, that in text mode only the whole pair (10,13) is replaced by 10 - simply killing all 13s seems a bit odd to me.

    Speaking of \r - it's really useful to printing debug/progress info, like in:

    for(int i=0;i<100;i++)printf("\r%d percents done ",i);

    Still, I find no use for new-line without carriage-return...



  • @qbolec said:

    Still, I find no use for new-line without carriage-return...


    Oh silly, its for saving bandwidth... I mean, if you want to scroll 10 lines, you don't need 10 "CRLF" only 10 LF and one CR

    Okay, I don't know if there really is a reason. :-)



  • @gremlin said:

    @bobday said:

    it seems that without that flag Windows will drop carriage returns from any files read in. And sure enough, when I looked it up, 13 is the ASCII value for a carriage return.


    I put the b flag in when using binary files in unix - it doesn't do anything but it can save some hair pulling later...


    I was actually using open/mmap in UNIX, because it's faster than malloc + fread.



  • Actually, I believe that OS X is case-insensitive.  "cd /tmp" and "cd /TMP" in the terminal will get you to the same place.



  • @danielpitts said:

    @qbolec said:
    Still, I find no use for new-line without carriage-return...


    Oh silly, its for saving bandwidth... I mean, if you want to scroll 10 lines, you don't need 10 "CRLF" only 10 LF and one CR

    Nice try but with the LF-with-implicit-CR you'd only need 10 bytes for that; so it saves you an additional byte.



  • Hahaha, this reminds me of way back in the day when I was writing dos apps.  We had a game that had large bitmaps as the background image.  Basically the foreground would scroll by fast, made out of tiles, but the background was one large bitmap scrolling slowly.

    For the longest of time the background would randomly go wonky - it would actually skew along a seam as if the bitmap was losing bytes at random.  certain colors would make it bend more than others.  A blank or monocolor with a black border wouldn't skew at all.  Something with a picture in would only skew around the picture.  It was very perplexing.

    Eventually we found the missing 'rb', so whenever it found a 13 or a 10 (which turned out to be a certain shade of red, since it was 256 color palette) it would drop that byte, causing the column to shift left one in the bitmap, and wrapping it around when it got to the end of the line...



  • @bobday said:


    I was actually using open/mmap in UNIX, because it's faster than malloc + fread.


    You can do that in Windows too, though it is less straightforward (everything in the Win32 API is more complicated than their Unix equivalents). For read-only access, you do:

    HANDLE hFileHandle = CreateFile(FilePath, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL);
    DWORD Size = GetFileSize(hFileHandle, NULL);
    HANDLE hFileMapping = CreateFileMapping(hFileHandle, NULL, PAGE_READONLY, 0, 0, NULL);
    void *pData = MapViewOfFile(hFileMapping, FILE_MAP_READ, 0, 0, 0);

    I ommited error checking and handling here; it's not difficult to do that, it's just tedious.



  • @bobday said:

    I was actually using open/mmap in UNIX, because it's faster than malloc + fread.

    Then you should be using the equivalent memory-mapping functions in Windows, which are CreateFileMapping and MapViewOfFile.

    If you google around, you can even find various compatibility modules that implement a Unix-style mmap for Windows by wrapping the Win32 functions in the more familiar interface.


Log in to reply