Don't test...not even in production!

stillwater

I once made an XSLT file that processed some financial information and was mission critical sufficiently complex it offered me unmatched job security I haven’t seen since then.

dkf

@Bulb said in Don't test...not even in production!:

Debugger starts to already be pretty useless when the application crashes on a value that is being set, to obviously invalid value, when handling some unrelated event earlier in a different thread, and you start with not even having idea which event.

I remember one time I was trying to debug some code that was smashing the return address of a function. That was levels of "fun".

These days, mostly my stuff is either impossible to debug because I can't launch it in a useful mode (because it needs to talk to a very picky service I don't control during startup) or is deep in the guts of Spring (technically debuggable, but you really don't want to go there) or requires a JTAG connection to hardware in a locked room 80km away.

Bulb

@dkf said in Don't test...not even in production!:

I remember one time I was trying to debug some code that was smashing the return address of a function.

That's what Valgrind or DrMemory (if you are on Windows) are for. Unless you run out of memory trying to use them like on my previous project. It was technically possible to work around that by providing swap to the device over NFS, which took a lot of work, and then valgrind ran … but wasn't getting enough debug information to be useful anyway. Arm and debug information tend not to like each other.

dkf

@Bulb In my case, I narrowed it down to which function was failing fairly quickly (with printfs, because running gdb on the coredumps was an exercise in futility and the fault was a long way into the run) but it took an absolute age to spot what the actual fault was. It worked on the systems at work, but failed at home. Because work was a big-endian UltraSPARC and home was a little-endian Linux, and this code is really bad but type-correct:

char GetNextNonWhitespace(FILE *f) {
    char c = '\0';
    fscanf(f, "%1s", &c);
    return c;
}

The fscanf() will overwrite two characters, one for the thing it read, and one for the string terminator. Despite the target buffer only being one character wide. What will that second character overwrite? In this case, it was part of the return address on one platform and nothing of any importance at all on another...

Gribnit

@dkf C: "A razor is made safer by sharpening it."

PleegWat

@dkf I once had a return address overwrite which was not only well into a run of a multithreaded application (ruling out valgrind) but also only occurring on a customer's system (multi-gigabyte coredumps don't travel well)

I ended up writing a debug build with which would log every function entry and return, as well as signal handler invocations, into a per-thread ringbuffer. At which point our segfault handler could just print it out, and it was pretty straightforward once I knew where it came from.

Bulb

@dkf said in Don't test...not even in production!:

this code is really bad but type-correct

The "%1s" specifier constrains the type of the argument, so I would probably not call it type-correct, but the line is very blurred in C.

Bulb

@PleegWat said in Don't test...not even in production!:

but also only occurring on a customer's system (multi-gigabyte coredumps don't travel well)

That's why the tools like minicoredumper or … I can't remember the name, but I think redhat has a crash reporting tool that just creates backtraces from the coredump as it is generated and discards it afterwards.

PleegWat

@Bulb We've got a segfault handler, which runs in-process. It prints the addresses from the stack frame in hex form, as well as certain fields from working objects (either strings, or integers in hex form). Coredumps are explicitly disabled by default for multiple reasons.

Bulb

@PleegWat Yeah, we had that too. I had to fix the in-process handler to re-raise the signal so turning the coredumps back on for debugging even worked.

But newer (as in, not ancient) Linux kernels support setting the coredump path to an executable that should be launched and get the coredump piped into it, and such tool can produce a backtrace and even send out a bug notification automatically. For any and every process on the system. Makes a lot of sense on servers.

boomzilla

@Gribnit said in Don't test...not even in production!:

@Kamil-Podlesak said in Don't test...not even in production!:

Try suggesting XSLT

XSLT is easy.

error

@boomzilla said in Don't test...not even in production!:

@Gribnit said in Don't test...not even in production!:

@Kamil-Podlesak said in Don't test...not even in production!:

Try suggesting XSLT

XSLT is easy.

I've done unspeakable things with XSLT.

Zecc

@error Too bad you can't talk about them.

robo2

@Zecc if it is about xslt, then it is better left untold.

Or I would feel obliged to state that I quite like xslt for processing xml, and that would ruin any reputation I might have.

Kamil Podlesak

@Gribnit said in Don't test...not even in production!:

@Kamil-Podlesak said in Don't test...not even in production!:

Try suggesting XSLT

XSLT is easy.

Well, I would agree, I had some fun implementing date sorting in XSLT1... but surprisingly few people find this kind of stuff enticing.

JBert

@robo2 said in Don't test...not even in production!:

Or I would feel obliged to state that I quite like xslt for processing xml, and that would ruin any reputation I might have.

Yeah, I don't mind XSLT 2 so much, it's just that you will get XML thrown at you.

Steve_The_Cynic

@Bulb said in Don't test...not even in production!:

Gdb is the only debugger where this ever worked for me, and only on the right platforms. That is, I don't think I ever got it to work when connecting to an ARM target.

Way far back, 1992 or so, I got hardware watchpoints to work with Turbo Debugger/386 for DOS. Saved my sanity, too, when a variable got corrupted near the beginning of my program but the program gagged on the corrupted value near the end.

Where hardware breakpoints fail the most is when they seem to work, but generate too many breaks to be useful.

dkf

@Steve_The_Cynic said in Don't test...not even in production!:

Where hardware breakpoints fail the most is when they seem to work, but generate too many breaks to be useful.

That's part of the same reason you don't want to debug into Spring if you can help it. You can but you probably won't learn anything because it's all a huge pile of mess inside and you'll have just so many false positives that you'll never find the real problem. And the real problem might be that some code never got run in the first place, long before your breakpoint ever gets triggered.

Gribnit

@dkf said in Don't test...not even in production!:

And the real problem might be that some code never got run in the first place, long before your breakpoint ever gets triggered.

To this point, tools that visualize the wiring graph are useful. But these aren't debuggers, as such.

Bulb

@Steve_The_Cynic said in Don't test...not even in production!:

Where hardware breakpoints fail the most is when they seem to work, but generate too many breaks to be useful.

That's where the commands come in handy. You set the commands on the breakpoint to print a backtrace and the values that are probably interesting and a continue. And then search the backtraces for one that looks wrong.

Zecc

@dkf said in Don't test...not even in production!:

That's part of the same reason you don't want to debug into Spring if you can help it.

checks calendar

I mean, five months is a long time to be debugging.

stillwater

@Zecc said in Don't test...not even in production!:

I mean, five months is a long time to be debugging

It took me about 10 seconds to get that joke.

Bulb

@Zecc said in Don't test...not even in production!:

@dkf said in Don't test...not even in production!:

That's part of the same reason you don't want to debug into Spring if you can help it.

checks calendar

I mean, five months is a long time to be debugging.

Alternatively you can go do it in Australia, but I guess that's a long way for you.

dkf

@Bulb said in Don't test...not even in production!:

Alternatively you can go do it in Australia, but I guess that's a long way for you.

Just find some guy called Bruce to do it for you.

Arantor

@dkf or indeed, Michael, as long as he’s ok with being called Bruce to avoid confusion.

dkf

@Arantor Yep. Our Bruce is called Steve.

Luhmann

@dkf
Strange name for a lady

Zecc

@Luhmann Tyler? Nah, dude just looks like a lady.

HardwareGeek

@Zecc said in Don't test...not even in production!:

Nah, dude just looks like a lady.

That thread is .

Luhmann

@HardwareGeek
Don't mix the streams

Steve_The_Cynic

@Luhmann said in Don't test...not even in production!:

@HardwareGeek
Don't mix the streams

But do stream the mixes?

Zecc

@Steve_The_Cynic Song of the Day is

Gribnit

@Luhmann said in Don't test...not even in production!:

@HardwareGeek
Don't mix the streams

Indeed. The prohibition against passing water into the river is included among the earliest known laws.