debugging a crash in someone else's code
-
I have a plugin for a third party closed source program. It crashes on Windows and Wine for a specific input, but not on the native Linux version.
The crash looks like this:
Unhandled exception: page fault on read access to 0x0000005b in 32-bit code (0x00b76e68). Register dump: CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b EIP:00b76e68 ESP:02c4dbf0 EBP:15a3c7a0 EFLAGS:00010286( R- -- I S - -P- ) EAX:e8a796be EBX:5cd1c0d8 ECX:00000007 EDX:00000000 ESI:00000000 EDI:00000000 Stack dump: 0x02c4dbf0: 1b048e90 02c4dca8 e08e8a72 5d46f388 0x02c4dc00: 1b049f78 5d585858 e8a796be 00000004 0x02c4dc10: 00000000 00000009 ffffffff ffffffff 0x02c4dc20: 00000000 5d61a518 5d61a51c 5d61a51c 0x02c4dc30: 5a63f000 7bcbe000 5d61a4e8 5d61a4ec 0x02c4dc40: 5d61a4ec 00000000 02c4dde8 15a3cc28 Backtrace: =>0 0x00b76e68 in [redacted] (+0x776e68) (0x15a3c7a0) 0x00b76e68: movl 0x54(%ecx),%edx
How would I go about figuring out what's causing the crash? Both the program and the plugin are insanely complex.
-
Raymond's in the middle of a series dealing with debugging, starts here: https://blogs.msdn.microsoft.com/oldnewthing/20160608-00/?p=93615.
Anything there of use?
-
@ben_lubar Do you have the source to the plugin? Or is it also closed-source?
-
@ben_lubar Is there a specific reason you're not reporting this to the plugin author(s) and waiting for them to fix it?
-
@pydsigner said in debugging a crash in someone else's code:
@ben_lubar Is there a specific reason you're not reporting this to the plugin author(s) and waiting for them to fix it?
It's made by CDCK Inc.? ‹/snark›
-
@blakeyrat said in debugging a crash in someone else's code:
Do you have the source to the plugin?
Yes, it's here: https://github.com/BenLubar/df-ai
-
@ben_lubar Just a couple days ago you told me that wasn't a plugin, it was just a DLL loader hack. Plugin implies the program it's modifying has some sort of API.
So you know which action of yours causes the crash, does that help narrow it any?
-
@blakeyrat said in debugging a crash in someone else's code:
So you know which action of yours causes the crash
That's the problem - I don't because the crash doesn't happen after any specific function call in my plugin I've been able to determine.
-
@ben_lubar Does it happen if your plugin is not involved at all?
-
@blakeyrat said in debugging a crash in someone else's code:
@ben_lubar Does it happen if your plugin is not involved at all?
No, but my plugin is driving all the input to the program, so it's very unlikely that it would not be related to my plugin.
-
@ben_lubar Right; but can you replicate the input without the plugin being involved?
What I'm getting at here is, did you find a bug in their code that occurs at a specific input, or do you have a bug in your code that stomps all over a data structure somewhere?
-
Ok, so someone in #dfhack introduced me to cl-linux-debug's browse-addr function.
The EBX register points to a pile of camel fat. The EBP register points to the RENDER_FAT reaction. So at this point I'm pretty sure something changed in the data structure for kitchens.
-
@ben_lubar said in debugging a crash in someone else's code:
So at this point I'm pretty sure something changed in the data structure for kitchens.
Well at least it's not changed in the data structure for raisins?
-
Ok, I'm able to reproduce the crash on a fresh install of the program with default plugins and an empty dfhack.init file. I can't reproduce the crash with all plugins unloaded.
Edit: reported:
-
@ben_lubar said in debugging a crash in someone else's code:
Ok, so someone in #dfhack introduced me to cl-linux-debug's browse-addr function.
The EBX register points to a pile of camel fat. The EBP register points to the RENDER_FAT reaction. So at this point I'm pretty sure something changed in the data structure for kitchens.
it took me a while to accept you meant what you said
-
@fbmac said in debugging a crash in someone else's code:
@ben_lubar said in debugging a crash in someone else's code:
Ok, so someone in #dfhack introduced me to cl-linux-debug's browse-addr function.
The EBX register points to a pile of camel fat. The EBP register points to the RENDER_FAT reaction. So at this point I'm pretty sure something changed in the data structure for kitchens.
it took me a while to accept you meant what you said
Clearly new to Dwarf Fortress, I see
-
@ben_lubar can you not just add an unhandled exception handler so that fatal application errors can go to the handler where you've conveniently added writing the error stack trace to disk, which includes a line number and file of the offending code?
It's like 8 lines of code.
-
@Matches said in debugging a crash in someone else's code:
@ben_lubar can you not just add an unhandled exception handler so that fatal application errors can go to the handler where you've conveniently added writing the error stack trace to disk, which includes a line number and file of the offending code?
It's like 8 lines of code.
I'm sure that's possible to do with SEGFAULTs on executables with no symbols.
-
@ben_lubar said in debugging a crash in someone else's code:
Ok, so someone in #dfhack introduced me to cl-linux-debug's browse-addr function.
The EBX register points to a pile of camel fat. The EBP register points to the RENDER_FAT reaction. So at this point I'm pretty sure something changed in the data structure for kitchens.
I believe I've just had a minor seizure. Anyway, since you have the memory map, why not just directly update the variable that contains the output of the render_fat reaction? Also, consider converting to ntfs. It's easier to render camel ntfs than camel fat.
-
@drurowin said in debugging a crash in someone else's code:
the variable that contains the output
Yeah, I'll just put the result of the play_a_video_game function into the variable. That makes sense.
-
@ben_lubar Well doesn't it like give you some other raw material? Just bump up the raw material or resource you need.
-
@drurowin said in debugging a crash in someone else's code:
@ben_lubar Well doesn't it like give you some other raw material? Just bump up the raw material or resource you need.
It's not running my code when it crashes. In fact, it's not running any part of the code I can touch when it crashes. Something somewhere in DFHack is corrupting some value that eventually causes the rendering of fat to dereference an invalid pointer.
-
@ben_lubar Bad pointers can be a complete ass to hunt down. Is it possible to run things with a memory debugging tool like valgrind or efence? Those can tell you a great deal even without source, though it helps if you've got a test case that can trigger the problem rapidly as they've got a lot of overhead. (I don't know the state of availability on Windows, and I try to make my own code not require techniques like that, but they're very good indeed when you need them…)
-
@dkf said in debugging a crash in someone else's code:
I don't know the state of availability on Windows, and I try to make my own code not require techniques like that, but they're very good indeed when you need them…
There's Dr. Memory, which is supposed to have similar features to the vanilla valgrind (i.e., not cachegrind). I don't have as much experience with it as with valgrind, though, so YMMV.
But, yeah, one of these could tell you whose memory is being stomped on, and possibly how. After that... GLHF.
-
@cvi said in debugging a crash in someone else's code:
cachegrind
BTW, that's a very nice tool despite being a PITA to work with. It's possible to chisel away quite a bit of performance trouble with the help of the detailed metrics it produces.
-
@ben_lubar said in debugging a crash in someone else's code:
@drurowin said in debugging a crash in someone else's code:
@ben_lubar Well doesn't it like give you some other raw material? Just bump up the raw material or resource you need.
It's not running my code when it crashes. In fact, it's not running any part of the code I can touch when it crashes. Something somewhere in DFHack is corrupting some value that eventually causes the rendering of fat to dereference an invalid pointer.
So just don't run the code that renders fat, just update the variable that render_fat updates with whatever value you need it to have. Using a Minecraft example, if you know the address that stores the number of Dark Oak Wood Planks you have, and the code to craft Dark Oak Wood Planks from Dark Oak Wood Logs crashes, why not just directly update the Dark Oak Wood Planks value and skip the crashing code?
-
@drurowin said in debugging a crash in someone else's code:
So just don't run the code that renders fat
It's an
[AUTOMATIC]
reaction, so it doesn't get run by my code.@drurowin said in debugging a crash in someone else's code:
why not just directly update the Dark Oak Wood Planks value and skip the crashing code?
Because I don't feel like intercepting a thing that's being intercepted and crashing because of the interception is a good idea.
Anyway, the workaround is
unload eventful
.
-
Checking linux vtables: VTable size mismatch: active_script_varst (active_script_varst) - expected 7, found 1 VTable size mismatch: script_varst (script_varst) - expected 1, found 0 VTable size unchecked: ui_build_selector VTable size unchecked: renderer Checking windows vtables: Argument size mismatch: 0043e6f0 item(itemst)::addImprovementFromJob #86 - expected 44, found 52 bytes. Argument size mismatch: 00b6c7b0 reaction_product(reaction_productst)::produce #2 - expected 36, found 44 bytes. VTable size unchecked: layer_object VTable size unchecked: active_script_varst VTable size unchecked: build_req_choicest VTable size unchecked: script_varst VTable size unchecked: ui_build_selector
-
@ben_lubar Maybe the binaries have been compiled with different compilers?
-
@ben_lubar said in debugging a crash in someone else's code:
@drurowin said in debugging a crash in someone else's code:
So just don't run the code that renders fat
It's [an
[AUTOMATIC]
reaction]Patch the image in memory so the whole reaction is replaced with the 0x90 opcode.
@drurowin said in debugging a crash in someone else's code:
why not just directly update the Dark Oak Wood Planks value and skip the crashing code?
Because I don't feel like intercepting a thing that's being intercepted and crashing because of the interception is a good idea.
You don't necessarily even need to go to the trouble of gathering the fat and ending up in the point that you run into the offending code anyhow. Just update whatever resources you need manually if you have the memory map.