Who wants to help me with a C#.net Out Of Memory exception??!!??!??!?



  • Here's the scenario:

    1) I got a big-ass IIS log file

    2) Every line of the log file has a long complex query string

    3) I gotta shove some bits of the query string into a database

    So, what I did is write code that looks like this (inStream is a StreamReader of the log file):

    string line;
    while ((line = inStream.ReadLine()) != null)
    {
    LogFileEntry entry = new LogFileEntry(line);
    logFile.Add(entry);
    }

    LogFileEntry is a class that basically has a member for every column of the log file, plus this code:

    // Key/value pairs stored in query string
    queryParams = HttpUtility.ParseQueryString("?" + csUriQuery);

    The problem is that after about 250k records, .net spits out an OutOfMemory exception. (According to Task Manager, there's plenty of memory.) I thought the problem was that HttpUtility.ParseQueryString() does a shitload of Substring operations, which causes crazy memory fragmentation, and what .net's really complaining about isn't that it ran OUT of memory but that it ran out of CONTIGUOUS memory.

    So in a stupid attempt to resolve that, I tried running GC.Collect() every 1000 times through the loop. That not only made the program slower than snot, but it still bailed on me at exactly the same point with an OutOfMemory exception.

    Any ideas?

    EDIT: It's actually an IIS log file; I don't work on the International Space Station.



  • Arrested Development is quite funny isn't it?



  • What is logFile.Add() doing?  The only thing I can see from the code you posted is that maybe that is holding on to all the instances of LogFileEntry that you create, and that's where all the memory is going.  Failing that, you can grab the trial version of ANTS memory profiler and see what's going on.



  • Hard to diagnose with the given information, but:

    Is it the same line that barfs every time? Is there anything interesting about that line? Does it run cleanly by itself (i.e., if the log file had just that line)?

    When you have a long running program with a memory leak, the process tends to get progressively slower as it does on. Does your program do that?



  • You're probably running into the 2GB limit on .NET objects, i.e. your logFile list. (OutOfMemoryException is poorly named. It should be called something like MemoryAllocationException, because that's what it is 99% of the time.)

    My suggestion to fix the problem would be: batch it. Figure out the rough number of lines you can parse without running out of memory, then wrap a for loop around your code. At the end of the for loop, write whatever you need to the DB, then Clear() your list. You'll have to rearchitect things a bit to handle line numbers instead of just reading line-by-line, but shouldn't be too major.

    BTW, I second ANTS for memory profiling. Often a single run is good enough to make you go "OF COURSE that's why the memory isn't being freed!"



  • I had a similar problem with a C++ program reading a mondo XML file. The final solution was for the producer to break the xml file into several smaller files, but compiling with the large address aware flag worked as a temporary solution.

    Surely something similar exists for C#?



  • @Sutherlands said:

    What is logFile.Add() doing? The only thing I can see from the code you posted is that maybe that is holding on to all the instances of LogFileEntry that you create, and that's where all the memory is going.

    Yeah, that's exactly what I want, I need to do some processing on the list later on, then shove it into a DB.

    @boomzilla said:

    Hard to diagnose with the given information, but:

    Is it the same line that barfs every time? Is there anything interesting about that line? Does it run cleanly by itself (i.e., if the log file had just that line)?

    Sorry I should have mentioned, I vetted the log files-- they are uncorrupted. The barfing happens on a random line, but always "near" the same number of records.

    @The_Assimilator said:

    You're probably running into the 2GB limit on .NET objects, i.e. your logFile list. (OutOfMemoryException is poorly named. It should be called something like MemoryAllocationException, because that's what it is 99% of the time.)

    JEBUS that's got to be what it is. Wow. WHAT THE FUCK.

    1) Why is there a 2 GB limit on objects, even on a 64-bit OS with 8 GB of RAM?!
    2) Why is it reported as an OutOfMemory exception?!
    3) Why the fuck does the documentation for the OutOfMemory exception not even HINT that this is a possible cause!?

    FAIL.

    Thanks for the help. This stumped a lot of people before I brought it here.



  • Yikes, I didn't realize that you were keeping a giant array / list of the entries. I had assumed that you were reading / analyzing / storing / forgetting each line as you went. At this point I guess you'll have to break it up somehow. At 250K objects, you can't be storing much before you hit that limit. In fact, just an array of 64-bit pointers would be hitting it, which is probably what you have (plus whatever overhead the container has, of course). Looks like there were some links to workarounds that break stuff up for you on that StackOverflow page.

    @blakeyrat said:

    1) Why is there a 2 GB limit on objects, even on a 64-bit OS with 8 GB of RAM?!

    Sound like they're using a long signed integer to manage memory. Unfortunately, on Windows, a long int is the same size on 32 bits as 64 bits. And assuming the CLR is implemented in C++, there's no easy way to write portable code that goes from 32-bits to 64-bits. C99 at least has portable datatypes that make this easy, but MSVC doesn't include that, and they aren't standard C++ anyways. No doubt, someone thought, "2GB objects should be enough for anybody."

    @blakeyrat said:

    2) Why is it reported as an OutOfMemory exception?!

    I'd imagine that's as much as can be diagnosed at the point where it happens: Try to allocate memory, get a null pointer...must be out of memory! It's probably not worth trying to investigate the other causes.

    @blakeyrat said:

    3) Why the fuck does the documentation for the OutOfMemory exception not even HINT that this is a possible cause!?

    They don't call this out explicitly (which makes some sense, as it's an implementation detail), but they do hint at it:

    Reading large data sets into memory.

    Just be glad you weren't using the compact framework:

    To prevent the exception, avoid programming large methods that consume 64 or more kilobytes of memory.


  • @boomzilla said:

    Yikes, I didn't realize that you were keeping a giant array / list of the entries. I had assumed that you were reading / analyzing / storing / forgetting each line as you went.

    Yeah, unfortunately, I have to "look back" to do the "analyzing". If I see a particular value, I need to go back and say, "wait a minute, did I see that before?"

    What I did to fix it yesterday was simply pull out merely the fields I *know* I need, but now that I have that StackOverflow link I'll go back and fix it properly using their code. Otherwise it'll just fail a month down the road when we're looking at a different server's data.

    Thanks for the help, all.



  • @blakeyrat said:

    Yeah, unfortunately, I have to "look back" to do the "analyzing". If I see a particular value, I need to go back and say, "wait a minute, did I see that before?"

    That sounds like the sort of thing that would be better with a hash or dictionary type of lookup, especially given the number of lines you're looking at. Maybe you're doing that already, along with an index into the array...



  • Normally I'd just shove this all into a DB and have it do the grunt work, but unfortunately I need to parse out url parameters, which is not something SQL is strong at.



  • Don't know if you've solved your problem or not, but have you looked at LogParser 2.2?  I haven't used it in years, but from what I can remember it was pretty good at querying IIS logs.

     

     



  • @blakeyrat said:

    Normally I'd just shove this all into a DB and have it do the grunt work, but unfortunately I need to parse out url parameters, which is not something SQL is strong at.

    Why not do both? 

    Read/parse then store 1 line at a time as part of the load

    Then get the DB to do the grunt work.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.