Load bearing stack-overflow



  • This is something of a self-WTF. Ok, more like a bug I can't nail down. We've all heard of the load-bearing print statement, the load bearing breakpoint, etc. But I seem to have created a load-bearing stack overflow. Without overflowing a certain stack(opengl matrix stack), my program refuses to run properly.



  • If you're overflowing the matrix stack, chances are you're not using it properly. What are you doing with it?



  • [quote user="DaBookshah"]This is something of a self-WTF. Ok, more like a bug I can't nail down. We've all heard of the load-bearing print statement, the load bearing breakpoint, etc. But I seem to have created a load-bearing stack overflow. Without overflowing a certain stack(opengl matrix stack), my program refuses to run properly.[/quote]

    Then you probably have an uninitialised variable somewhere, or perhaps you have retained a pointer to some heap memory after freeing it, or used some malloced heap memory without clearing it first?

     



  • As I said, the stack overflow makes it work. No Dave, not because of any of those reasons.



  • Oops, now my previous post decides to show up, finally. Anyway.....

    I don't think either of you understand the WTF here. The WTF is that I wrote an opengl program, and deliberately introducing a stack overflow makes it work PERFECTLY. Removing the stack overflow results in weird behaviour.



  • Sounds like you need to break your program (by fixing the overflow) then find out what's *really* wrong.

    If the overflow makes it work perfectly it sounds like somewhere you're pushing more than you pop.  Even my kids know that pushing more than you can pop makes a mess on the floor.

    My guess would be a function prototype in one file has more parameters than the actual function does, and by some stroke of luck that function gets called when the stack is nearly full - you'd have a harder job finding it otherwise.  Good luck!



  • [quote user="DaBookshah"]
    I don't think either of you understand the WTF here. The WTF is that I wrote an opengl program, and deliberately introducing a stack overflow makes it work PERFECTLY. Removing the stack overflow results in weird behaviour.[/quote]

       No, it doesn't make it work "perfectly"; it makes it work "by sheer good luck".  The fact is that you don't actually know what is going on here, so you can't make such claims.  The only difference between overflowing the matrix stack and not overflowing it is that the contents of a large chunk of memory is different in the two cases.  Hence if the subsequent behaviour of the program varies, it must depend on those contents, and behave differently as a consequence of what has been written there during the overflow.

      How big is your code?  Can you post it?  Can you reduce it to a short testcase?

     



  • The OpenGL matrix stack is distinct from the stack used for holding variables. Anybody posting about uninitialized variables needs to re-read the first post.



  • [quote user="DaBookshah"]
    I don't think either of you understand the WTF here. The WTF is that I wrote an opengl program, and deliberately introducing a stack overflow makes it work PERFECTLY. Removing the stack overflow results in weird behaviour.[/quote]

    Reminds me of a bug I tracked down about 15 years ago in AIX 3.1, which also curiously featured SG's GL (the precursor to OpenGL).

    I was working on a module for a scientific visualization package that read a particular data file format.  The renderer for this package ran on a number of devices, including the "Sabine" graphics card for the RS/6000.  Sabine was basically a rebranded SG graphics subsystem, and it ran GL, so the device-dependent portion of the renderer would open a GL window (on the X desktop) with the "openwin" function and then make a bunch of GL calls to draw in it.

    Problem was, when you read a data file in this particular format and then tried to process and render that data, the openwin call would fail.  Get the data into the program in any other fashion and you were fine.  The file-processing code itself was some public-domain C code, and it did a lot of memory allocation and freeing, so on a hunch I commented out all the calls to free (with a macro, natch) in case the underlying problem was heap corruption.  Bingo - openwin worked.

    OK - so there was clearly a dup-free or a buffer overrun or some such thing in the file processor... but I desk-checked the whole thing and couldn't find it.  I instrumeted the code and logged every single allocation and free, then processed the log with an awk script, which verified that they were all fine.  So then I wrote a C program that read the log, performed all the same allocations and frees, touched the first several bytes of each allocated area, and then called openwin.  And openwin failed.

    I eventually cut this down (using binary search, of course) to a C program that allocated 11 areas, touched the first 16 bytes of each, freed 3 of them, and called openwin, which would fail.  (To this day, this remains my personal favorite cut-down demonstration of a bug.)  I sent that off to AIX support.

    I got a response the next day: there was a bug in the AIX C runtime implementation of free, and under the right conditions it would stomp some memory that the GL library used.  Apparently prerelease testing of both the C runtime and the GL library failed to catch it.

    So in that case, deliberately introducing a bug (a memory leak) did make a program work correctly.



  • [quote user="DaveK"]

      
    No, it doesn't make it work "perfectly"; it makes it work "by sheer
    good luck".  The fact is that you don't actually know what
    is going on here, so you can't make such claims.

     [/quote]

     That's called programming by coincidence.
     



  • [quote user="DaBookshah"]Oops, now my previous post decides to show up, finally. Anyway.....

    I don't think either of you understand the WTF here. The WTF is that I wrote an opengl program, and deliberately introducing a stack overflow makes it work PERFECTLY. Removing the stack overflow results in weird behaviour.[/quote]

    I've done a little bit of OpenGL work recently - what exactly do you mean by "weird behavior" when you fix the stack overflow? I'm definitely in the camp that thinks that if things appear to be working in an error state, then they aren't working for the reasons you think they are.

    If possible, try your app on different platforms (I do my OpenGL work on Windows and OS X) to see if both exhibit the same issues. Sometimes there is enough of a difference in compilers / implementation to help you track down the errors. 



  • [quote user="DaBookshah"]This is something of a self-WTF. Ok, more like a bug I can't nail down. We've all heard of the load-bearing print statement, the load bearing breakpoint, etc. But I seem to have created a load-bearing stack overflow. Without overflowing a certain stack(opengl matrix stack), my program refuses to run properly.[/quote]<font size="+1">G</font>o back to your previous code--before you started this latest feature, verify the bug is not there and then you can add in your changes until it breaks.  It should be easy to find the problem code since it is limited to the changes you have just made.

    Note: The above does not apply if you have a rogue pointer (C or C++).

     



  • [quote user="bugmenot"]The OpenGL matrix stack is distinct from the stack used for holding variables. Anybody posting about uninitialized variables needs to re-read the first post.[/quote]

      No, YOU need to grasp the concept that there can be variables on the heap as well as on the cpu stack, and that they can be equally uninitialised, and that expanding the matrix stack until it overflows can cause a lot of heap to be sbrk'd and then filled with one kind of data rather than sbrk'd sometime later and filled with a different kind of uninitialised data.

      Show me where I said that these variables are on the stack, rather than in dynamically-allocated storage or the .data or .bss segments.  You can't, because I didn't; you incorrectly assumed that I was talking about the cpu stack.  You know what they say about assuming.



  • [quote user="Cloaked User"][quote user="DaveK"]

       No, it doesn't make it work "perfectly"; it makes it work "by sheer good luck".  The fact is that you don't actually know what is going on here, so you can't make such claims.

     [/quote]

     That's called programming by coincidence.
     

    [/quote]

     :-)  My personal favourite term for it is "Cargo-cult programming"



  • [quote user="DaveK"]

    [quote user="bugmenot"]The OpenGL matrix stack is distinct from the
    stack used for holding variables. Anybody posting about uninitialized
    variables needs to re-read the first post.[/quote]

      No, YOU
    need to grasp the concept that there can be variables on the heap as
    well as on the cpu stack, and that they can be equally uninitialised,
    and that expanding the matrix stack until it overflows can cause a lot
    of heap to be sbrk'd and then filled with one kind of data rather than
    sbrk'd sometime later and filled with a different kind of uninitialised
    data.

    [/quote]

    You need to grasp the concept that the OpenGL matrix stack might not even be in main memory at all.


Log in to reply