So, CI build failed...
-
Our CI team is very special. They managed to build the most unreliable build system imaginable, where tests fail mainly because of environment problems (cannot connect to SVN, no space left on device, random timeouts, processes dying without notice). But this... This error is a whole new level of even for them:
11:27:45 /jenkins_dev/workspace/[redacted]/scripts/chroot.sh: line 10: 18083 Segmentation fault sudo mount --bind "${dir}" "${dirname}"
-
Remind me, how does one get mount to segfault?
-
I have no goddamn idea.
This one is even better:
12:12:10 /jenkins_dev/workspace/[redacted]/scripts/chroot.sh: line 10: 18726 Segmentation fault mkdir -p "$dirname"
-
It's easy to do if you mess up your shared libraries. For example, [g]libc version mismatches... Given that these things are executed in a (presumably clean) environment by sudo, this seems unlikely but maybe they've mixed and matched binaries from different OS versions....
-
Lack of RAM?
-
Lack of RAM for mkdir? What are they running this on? An asthmatic hamster?
-
-
They managed to build the most unreliable build system imaginable, where tests fail mainly because of environment problems
why?!
just WHY?!
that's not a thing that should be.
just use jenkins, or travis, or any one of the dozen or so pre built CI solutions!
-
We ARE using Jenkins.
-
Lack of RAM for mkdir? What are they running this on? An asthmatic hamster?
Pretty much.
-
-
Because we have a very... special build system. Dozens of shell scripts, hundreds of makefile lines...
-
Why?
-
Probably because Linux is a piece of shit. Would be my guess.
-
Lack of RAM for mkdir?
Either that or they're doing something strange with mounted filesystems. Please, let that not be the case!
-
The file system lurks ever below, Cthulhu-like, until the stars are right and the poets startle awake from barely-remembered nightmares.
-
Why?
Still, we have it better than the guys downstairs - while our build system needs /sdkroot directory under root of filesystem filled with all the dependencies in there (paths can go down like 20 directories there), their is so dependent of the environment that they can only compile their code on specially designated servers. And they have two separate build systems for each of the two target platforms. And guys from Germany working on the same project as those poor souls have another two separate build systems. Each of the four is exactly as much fucked up. Yay for giant international corporations!
-
Because we have a very... special build system. Dozens of shell scripts, hundreds of makefile lines...
...... that is not a sane build system
their is so dependent of the environment that they can only compile their code on specially designated servers.
that is.... i think they're summoning an elder god.... that's never good.
-
Introduce Yocto Linux. Based on OpenEmbedded, it's a fantastic build system for Linux based OS images. I'm working on switching my current employer's client over to it now.
-
We use Yocto too at our company, but much lower in the software stack than my project is. I don't have enough friends in other divisions to know how bad it is, though.
-
I'm almost wondering if said asthmatic hamster has a case of bad RAM...
-
-
Remind me, how does one get mount to segfault?
Custom
/sbin/mount.myfilesystem
binary. At least that's how I'd do it.Oh, you didn't ask for instructions? My bad ...
-
Fair enough. Now, pray tell, how does one segfault mkdir?
-
how does one segfault mkdir?
Having thought about it for a while, the easiest way will be to have a corrupted (or incompatible) C library. Or something like that. Given that there's a chroot involved, that's actually quite easily done. Fuck, that's nasty…
-
Yeah, it's quite possible. But weird considering the builds were going on just fine until yesterday.
-
But weird considering the builds were going on just fine until yesterday.
Not if they upgraded the C library in one place but not the other.
-
Our company employs the most extreme interpretation of "if it ain't broken, don't fix it" policy.
-
Our company employs the most extreme interpretation of "if it ain't broken, don't fix it" policy.
If you suggest altering anything that someone thinks isn't broken, Genghis Khan and his Golden Horde come thundering through a time portal armed with rocket-propelled pepper spray canisters to apply a thorough chastisement (and manicure)?
-
Even worse - if you suggest changing something, nothing happens. At all.
-
Segmentation fault sudo mount --bind "${dir}" "${dirname}"
That screams HARDWARE FAULT to me.
When you get segfaults from system binaries, the most likely problem is hardware, so the first thing to do should be to run memtest86+ for a day or two and a full surface disk check on the box. If that does not reveal something, then reinstalling the system is the other thing to try, but hardware problem is really the most likely reason.
Bad memory would be more likely to crash the compiler, as that is what uses the largest amount of memory in the build process, so I am slightly more inclined to think that this a disk or disk controller returning bad data that affect some particular commands. But it can be either. Force them to run comprehensive diagnostics on the box.
-
Force them to run comprehensive diagnostics on the box.
I can't. Those aren't our machines, and the owners are too far to the side on the organization chart.
-
Those aren't our machines, and the owners are too far to the side on the organization chart.
TR
But they are working for your project, aren't they? And they are building the deliverables, aren't they? So the project manager should have some power over them. Especially once you can argue that the builds cannot be trusted, because the build machine is probably broken. That can be escalated to higher managers. Yes, I worked in large company and I know it is sometimes pain to get request through the bureaucracy. But this seems like issue that, with a pinch of exaggeration, might work even there.
-
But they are working for your project, aren't they?
No. They run the servers. They're almost like external contractors.The issue is being worked on by our CI guys, who might be trying to escalate it. I don't know the details, nor I care.
-
I actually quite like it. It's a very clean, well-designed system. And, it works fantastically well with almost any VCS.