Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you may not be able to execute some actions.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

No thread about the GitLab fuckup yet?

Maciejasjmj last edited by

GitLab.com melts down after wrong directory deleted, backups fail

And straight from the horse's mouth:

https://docs.google.com/document/d/1GCK53YDcBWQveod9kfzW-VCxIABGiryG7_z_6jHdVik/pub

Everything that could be fucked up, was fucked up. It's a miracle they didn't physically blow up the servers while they were at it.
Reply Quote 30
3 Replies Last reply
boomzilla ♿ (Parody) last edited by

Should have used Oracle.
Reply Quote 5
1 Reply Last reply
RaceProUK FoxDev last edited by
Reply Quote 18
1 Reply Last reply
loopback0 Discourse touched me in a no-no place last edited by @Maciejasjmj

@Maciejasjmj said in No thread about the GitLab fuckup yet?:

And straight from the horse's mouth

Removed a user for using a repository as some form of CDN, resulting in 47 000 IPs signing in using the same account (causing high DB load)
Reply Quote 13
A 1 Reply Last reply
Onyx BINNED last edited by

SH: It looks like pg_dump may be failing because PostgreSQL 9.2 binaries are being run instead of 9.6 binaries.

Which idiot upgraded the fucking system and didn't pay attention to YOUR FUCKING DATABASE TOOLING BEING UPDATED? Fuck knows if startup scripts even work.
Reply Quote 12
2 Replies Last reply
asdf Winner of the 2016 Presidential Election last edited by

TL;DR: The backup system I "designed" (hacked together) for a small web development company while in college is more reliable and provides better reporting than that of gitlab.com.

"Professionals" at work.
Reply Quote 1
1 Reply Last reply
Maciejasjmj last edited by @boomzilla

@boomzilla said in No thread about the GitLab fuckup yet?:

Should have used ~~Oracle~~TFS.
Reply Quote 7
1 Reply Last reply
Yamikuronue I survived the hour long Uno hand last edited by

I copied this to my new boss and team with "This is the best reason I've ever seen to run regular disaster drills"
Reply Quote 20
2 Replies Last reply
Rhywden last edited by @Yamikuronue

@Yamikuronue said in No thread about the GitLab fuckup yet?:

I copied this to my new boss and team with "This is the best reason I've ever seen to run regular disaster drills"

Indeed. I mean, how can you even setup a backup system and then not even verify once that it actually works?

Our backups to S3 apparently don’t work either: the bucket is empty
Reply Quote 7
A 2 Replies Last reply
Yamikuronue I survived the hour long Uno hand last edited by @Rhywden

@Rhywden Maybe I'm being charitable, but I assume it worked when they started, and they have a process to delete old backups... and somewhere along the way it stopped working, and deleted all the old backups. Which is why I said regular drills -- just because shit worked three years ago doesn't mean it still works.
Reply Quote 9
1 Reply Last reply
cartman82 last edited by

That Google Doc mentioned in the last tweet notes: "This incident affected the database (including issues and merge requests) but not the git repos (repositories and wikis)."

So some solace there for users because not all is lost.

WHAT? That's so so much worse.

Git repos and actual files are strewn all over people's computers. Those can be replaced. Issues and chats and tickets are actually irreplaceable. When you pay gitlab, you are paying them to take care of those, not the actual code.

Reading that log, it's clear they were NOT ready to take on the responsibility of running this self-hosted service as opposed to letting Amazon handle it. They did the basics, but obviously haven't had enough time to test everything properly.

It's sad. I'd really like to see someone challenge github's monopoly.

The only redeeming light is that they are airing all their screwups into the open. If they survive this, it will turn them into a much more capable hosting company.
Reply Quote 27
2 Replies Last reply
cartman82 last edited by @Onyx

@Onyx said in No thread about the GitLab fuckup yet?:

Which idiot upgraded the fucking system and didn't pay attention to YOUR FUCKING DATABASE TOOLING BEING UPDATED? Fuck knows if startup scripts even work.

They are bundling all this stuff into their stupid "omnibus" package. It's pain in the ass managing all these services individually.
Reply Quote 0
1 Reply Last reply
asdf Winner of the 2016 Presidential Election last edited by @Yamikuronue

@Yamikuronue said in No thread about the GitLab fuckup yet?:

and somewhere along the way it stopped working

Which means they ran a shell script without set -e or checking the return values of the relevant commands. Or nobody gets notified when the script itself fails. Both of which is unacceptable; no capable SysAdmin would ever write scripts that don't notify anyone if they fail.
Reply Quote 7
2 Replies Last reply
cartman82 last edited by @Yamikuronue

@Yamikuronue said in No thread about the GitLab fuckup yet?:

I copied this to my new boss and team with "This is the best reason I've ever seen to run regular disaster drills"

Good idea.

Sending this to mine.
Reply Quote 21
1 Reply Last reply
Yamikuronue I survived the hour long Uno hand last edited by @asdf

@asdf Agreed. I've done it, but I've never claimed it was a good idea to make me to anything sysAdmin-y :D
Reply Quote 0
1 Reply Last reply
Rhywden last edited by Rhywden @cartman82

@cartman82 They could give this guy/gal a proper title. Like "Master of Disaster".
Reply Quote 14
1 Reply Last reply
A
anonymous234 last edited by anonymous234 @loopback0

@loopback0 said in No thread about the GitLab fuckup yet?:

Removed a user for using a repository as some form of CDN, resulting in 47 000 IPs signing in using the same account (causing high DB load)

Fucking nazi mods always finding reasons to ban people. There's no rule against sharing your repository with 47,000 people!
Reply Quote 16
1 Reply Last reply
asdf Winner of the 2016 Presidential Election last edited by asdf @Yamikuronue
@Yamikuronue
Redirecting root's mail to a mailing list which all sysadmins are subscribed to is literally the first thing I did after installing Linux when I set up our dev server back then. ;) And even if you forgot to do that, you should make sure your backup script reports failures somehow. A simple:
```
#!/bin/bash
set -e
function cleanup {
  mail -s "backup failed" sysadmin@gitlab.com
}
trap cleanup EXIT

# actual script here

trap - EXIT
```
Would be a start.
Reply Quote 7
1 Reply Last reply
Yamikuronue I survived the hour long Uno hand last edited by Yamikuronue @asdf

@asdf Yeah, in my case it was deploy scripts; I wrapped the actual deploy script in a second script (for valid reasons I won't get into now), and forgot to return the inner return value. So the deploy "succeeded" with a screen full of errors. Whoops. Thankfully, people noticed pretty quick. Backup scripts I won't touch, for this exact reason.
Reply Quote 0
2 Replies Last reply
asdf Winner of the 2016 Presidential Election last edited by @Yamikuronue

@Yamikuronue said in No thread about the GitLab fuckup yet?:

forgot to return the inner return value

Oh, yeah, that's a classic. Everyone who's ever touched Shell scripts has made that mistake once or twice. ;)
Reply Quote 4
1 Reply Last reply
heterodox :belt_onion: last edited by @RaceProUK

@RaceProUK said in No thread about the GitLab fuckup yet?:

That's actually the most fascinating part of this for me. The Google Doc and the YouTube Live stream were an offer of transparency in emergency response to an extent I've never seen before, impressive for this being their first major response (that we know of, obviously). Once GitLab gets their shit together, that kind of makes me want to use their services because I know if something goes wrong, there won't be a disingenuous cover-up.
Reply Quote 22
1 Reply Last reply
RaceProUK FoxDev last edited by @heterodox

@heterodox said in No thread about the GitLab fuckup yet?:

Once GitLab gets their shit together, that kind of makes me want to use their services because I know if something goes wrong, there won't be a disingenuous cover-up.

You have a point, but I'd be more concerned about the fact they had five backup strategies, all of which failed. And no-one noticed until it was too late.
Reply Quote 9
1 Reply Last reply
loopback0 Discourse touched me in a no-no place last edited by @RaceProUK

@RaceProUK said in No thread about the GitLab fuckup yet?:

they had five backup strategies, all of which failed

They didn't all fail, some of them weren't even setup.
Reply Quote 13
2 Replies Last reply
RaceProUK FoxDev last edited by @loopback0

@loopback0 said in No thread about the GitLab fuckup yet?:

some of them weren't even setup

...

...

...

Is it possible to facepalm so hard you bend time?
Reply Quote 11
2 Replies Last reply
loopback0 Discourse touched me in a no-no place last edited by @RaceProUK

@RaceProUK As said by The Register...

The world doesn't contain enough faces and palms to even begin to offer a reaction to that sentence.
Reply Quote 11
1 Reply Last reply
Vault_Dweller Notification Spam Recipient last edited by @loopback0

@loopback0 I think "weren't even setup" falls pretty heavily under the definition of "failed"
Reply Quote 3
2 Replies Last reply
RaceProUK FoxDev last edited by @Vault_Dweller

@Vault_Dweller It's a bit like that question "If a tree falls and there's no-one to hear it, does it make a sound?"
Reply Quote 2
2 Replies Last reply
Yamikuronue I survived the hour long Uno hand last edited by

I now have a bet with a coworker that they'll survive a year. He thinks they'll go under because of this incident.
Reply Quote 2
1 Reply Last reply
Onyx BINNED last edited by @anonymous234

@anonymous234 said in No thread about the GitLab fuckup yet?:

There's no rule against sharing your repository with 47,000 people!

https://news.ycombinator.com/item?id=12168746
Reply Quote 12
1 Reply Last reply
loopback0 Discourse touched me in a no-no place last edited by @Vault_Dweller

@Vault_Dweller said in No thread about the GitLab fuckup yet?:

@loopback0 I think "weren't even setup" falls pretty heavily under the definition of "failed"

It's hard to fail if you're not even attempting something.
Reply Quote 3
3 Replies Last reply
Vault_Dweller Notification Spam Recipient last edited by @loopback0

@loopback0 It's a backup strategy, i.e. there was a strategy. Whether it was implemented is another matter.
Reply Quote 6
1 Reply Last reply
Vault_Dweller Notification Spam Recipient last edited by @Vault_Dweller

@Vault_Dweller Or rather, the fact that it wasn't implemented was the point of failure
Reply Quote 4
1 Reply Last reply
tufty last edited by

Rails

Mmmmm-hmmmm.

@Rhywden said in No thread about the GitLab fuckup yet?:

how can you even setup a backup system and then not even verify once that it actually works?

I worked for a place once that outsourced its backups. Paid hyoooooge money for those backups, we did.

Filed under : But if you tried to get a restore done, say to the development servers, you found 95% of the data had been backed up to /dev/null
Reply Quote 4
1 Reply Last reply
flabdablet last edited by @Yamikuronue

@Yamikuronue said in No thread about the GitLab fuckup yet?:

Backup scripts I won't touch, for this exact reason.

I run my backup scripts by hand and eyeball their progress spew.

This approach is of course in no way scalable; I can get away with it because I'm only backing up the one VM host.

It's quite comforting to keep a really close eye on the backup process. Found a failing source drive once just because backup was running slower than I'd come to expect. SMART and RAID logs showed nothing untoward. Did read-speed tests on all the drives in the set individually (hurrah for software RAID), replaced the one running at a quarter of the speed it should, and slapped it into service at my house as a secondary backup; two months later it started reallocating sectors.
Reply Quote 3
1 Reply Last reply
JazzyJosh last edited by @Maciejasjmj

@Maciejasjmj Well not everything. Apparently the git repos were fine?
Reply Quote 0
1 Reply Last reply
flabdablet last edited by @RaceProUK

@RaceProUK said in No thread about the GitLab fuckup yet?:

It's a bit like that question "If a tree falls and there's no-one to hear it, does it make a sound?"

More like "if a tree falls but nobody had ever actually bothered to plant it", surely?
Reply Quote 18
1 Reply Last reply
M
masonwheeler Impossible Mission - B last edited by

What a bunch of gits!
Reply Quote 7
1 Reply Last reply
heterodox :belt_onion: last edited by @Yamikuronue

@RaceProUK said in No thread about the GitLab fuckup yet?:

You have a point, but I'd be more concerned about the fact they had five backup strategies, all of which failed. And no-one noticed until it was too late.

You only know that because of the transparency, though. The amount of detail you'd get from most other companies is a single, bland statement that, "Due to an unscheduled outage in production, about six hours of issues and pull requests were lost, but not any of your files! So you're all good. If you have a paid account or something and you really want some form of compensation, then fine, open a ticket with our support team and we'll figure something out (i.e., ignore you)."

@Yamikuronue said in No thread about the GitLab fuckup yet?:

He thinks they'll go under because of this incident.

It's possible, but I hope they don't.

@flabdablet said in No thread about the GitLab fuckup yet?:

This approach is of course in no way scalable; I can get away with it because I'm only backing up the one VM host.

Right. That'd be my counterargument to, "Why didn't anyone notice... x, y, or z?" Because they have a pretty large project as a GitHub competitor and appear to be, as 99% of companies are, operating on a shoestring Ops budget. (I think there were notations in the Google Doc from two, maybe three ops by initials? It's hardly a booming department.) Things fall through the cracks; as a lot of the opshugs said (what a cute concept), these things happen everywhere.

@JazzyJosh said in No thread about the GitLab fuckup yet?:

@Maciejasjmj Well not everything. Apparently the git repos were fine?

Yeah, those were stored on the filesystem, obviously, and not in Postgres.
Reply Quote 12
1 Reply Last reply
dkf Discourse touched me in a no-no place last edited by @Maciejasjmj

@Maciejasjmj I'm thinking “thank god”. We use their stuff, but in our own deployment because it has some things in it we need to keep confidential by law and it's easier to do things ourselves than figure out if some random company out there is doing it right. At the very least, it means we own our disasters instead borrowing someone else's… ;)
Reply Quote 5
1 Reply Last reply
flabdablet last edited by @tufty

@tufty said in No thread about the GitLab fuckup yet?:

95% of the data had been backed up to /dev/null

To be fair, you can write a shitload of data to /dev/null before it fills up.
Reply Quote 21
1 Reply Last reply
A
anonymous234 last edited by @Rhywden

@Rhywden I'm no DBA, but it seems like making a script that fetches the backups every week, restores them to a temporary database and then runs some sanity checks on that (i.e. make sure the amount of rows is approximately equal to the production database) would be an obvious first line of defense.

...actually, screw this. Databases and backups should (in most cases) be a solved problem by now. It should be an automatic, foolproof, one-click process.
Reply Quote 5
2 Replies Last reply
flabdablet last edited by @anonymous234

@anonymous234 said in No thread about the GitLab fuckup yet?:

It should be an automatic, foolproof, one-click process.

and yet somehow it so very rarely is.
Reply Quote 1
A 2 Replies Last reply
boomzilla ♿ (Parody) last edited by @flabdablet

@flabdablet Is there anything that is? Every so often we get a thread moaning about how "It's $current_year why isn't this solved‽"
Reply Quote 2
M 2 Replies Last reply
RaceProUK FoxDev last edited by @boomzilla

@boomzilla said in No thread about the GitLab fuckup yet?:

Is there anything that is?

Clicking a button is a one-click process. Does that count?
Reply Quote 12
1 Reply Last reply
A
anonymous234 last edited by @flabdablet

@flabdablet It's almost like EVERYTHING RELATED TO COMPUTERS IS COMPLETE SHIT.
Reply Quote 10
1 Reply Last reply
Jaloopa kills Dumbledore last edited by @flabdablet

@flabdablet said in No thread about the GitLab fuckup yet?:

To be fair, you can write a shitload of data to /dev/null before it fills up.

It's very quick too. Pity it's a write-only medium
Reply Quote 7
2 Replies Last reply
hungrier last edited by @cartman82

@cartman82 said in No thread about the GitLab fuckup yet?:

It's sad. I'd really like to see someone challenge github's monopoly.

Has Bitbucket ever had a major catastrophe like this?
Reply Quote 0
1 Reply Last reply
dcon last edited by @Jaloopa

@Jaloopa said in No thread about the GitLab fuckup yet?:

Pity it's a write-only medium

I had a backup system like that. Back when I got my first computer, I had a cassette tape backup system (on MSDOS). Never tested restore. You can guess what happened when I needed it... (My backup now is manual - copy data files to multiple places. OS/Programs can be reinstalled)
Reply Quote 0
1 Reply Last reply
boomzilla ♿ (Parody) last edited by @RaceProUK

@RaceProUK said in No thread about the GitLab fuckup yet?:

@boomzilla said in No thread about the GitLab fuckup yet?:

Is there anything that is?

Clicking a button is a one-click process. Does that count?

No: We've had threads (and I know I've started at least one of them) about buttons that didn't look like buttons so that you didn't know you could click them.
Reply Quote 10
1 Reply Last reply
flabdablet last edited by @Jaloopa

@Jaloopa said in No thread about the GitLab fuckup yet?:

It's very quick too

Infinite write capacity, read speed that's literally off the charts - what's not to like?
Reply Quote 3
1 Reply Last reply

72
Posts

5744
Views

Log in to reply

1
2

1 / 2