How *not* to segfault (or, the height of posixly-incorrect rudeness)
-
So, we have been integrating $vendorlibrary (note: to protect the guilty, said vendor's identity has been redacted) into our codebase at $work for a couple of raisins, my $project one of them. This had been going along at a bit of a fitful pace, until one of the architects alerted me to a critical ticket in our bugtracker saying an application task that uses one of the things that integrates with $vendorlibrary was hanging, in the test environment. After reviewing the ticket with the dev who triaged it, it became fairly clear that the library was causing SIGSEGV to be mishandled, turning what should have been a clean coredump and restart into a hung process that was getting SEGV's over a thousand times a second(!), and making debugging the underlying SEGV far harder than necessary. After some fiddling and debate with one of the people at $work who supports $vendorlibrary, I decided to stop screwing around and create a MWE for the problem:
$ cat mwe.cpp #include <vendorlibrary.h> int main() { char* die = reinterpret_cast<char*>(-1); *die = '\0'; return *die; } $ gcc -I/path/to/vendorlibrary -L/path/to/vendorlibrary -Wl,-rpath,/path/to/vendorlibrary -lvendor -o mwe mwe.cpp $ ./mwe
This conclusively ruled out an API usage error, and pointed a smoking gun straight at the vendor, as the backtrace I obtained from SIGQUIT-ing the MWE was identical in nature to what was post-signal in the test environment, and using strace on it also showed the thousand-segfaults-a-second that we saw associated to the live hang.
(Feel free to post if you've seen something like this before; while I'm not particularly looking for help, and $work will be filing a support ticket with the vendor on this for sure, knowing what sort of that we're up against beyond "dodgy SEGV handler" would be nice.)
-
@tarunik said in How *not* to segfault (or, the height of posixly-incorrect rudeness):
getting SEGV's over a thousand times a second
-
-
@tarunik said in How *not* to segfault (or, the height of posixly-incorrect rudeness):
getting SEGV's over a thousand times a second(!)
-
My guess: some global initialization startup code in the library configures a signal handler, and that signal handler itself crashes and thus invokes itself. I would be surprised if the header include is necessary.
-
@LB_ said in How *not* to segfault (or, the height of posixly-incorrect rudeness):
and that signal handler itself crashes
Or it tries to restart the code that SEGVd at the point exactly where it crashed.
-
@dkf said in How *not* to segfault (or, the height of posixly-incorrect rudeness):
@LB_ said in How *not* to segfault (or, the height of posixly-incorrect rudeness):
and that signal handler itself crashes
Or it tries to restart the code that SEGVd at the point exactly where it crashed.
Like the old VB directive
On Error Resume Error
-
@LB_ my suspicion is that the signal handler is reinstalling itself improperly (they're using the signal API in single-shot mode for some unbeknownst reason, or at least that's what strace told me about what they were doing). We've hacked around it for now using
signal(SIGSEGV, SIG_DFL);
, and they escalated it to their devs within a couple days of us reporting it to them, but we haven't heard a peep since.
-
@tarunik said in How *not* to segfault (or, the height of posixly-incorrect rudeness):
we haven't heard a peep since.
Typical. Since then built it up it's probably proving difficult to unwind and do it slightly more right...