How do you fuck up data sanitization this badly?

DoctorJones

TLDR

Mozilla has a data sanitization process that was failing. Somehow the fail condition generated an archive file containing developer account details (including 76'000 email addresses) and exposed them on a public facing web server.

Filed under: I can only imagine the sort of WTF-ness in the sanitization process that would lead to sensitive information being dumped on a public facing web server...

PJH

###Mozilla Exposes Email Addresses of 76,000 Developers and 4,000 Password Hashes
Mozilla, the foundation behind the popular Firefox Web Browser, warned on Friday
that it had mistakenly exposed information on almost 80,000 members of its Mozilla Developer Network (MDN) as a result of a botched data sanitization process.

That's nearly 100,000! which is nearly 1,000,000!!!!eleventy!

And I thought the BBC was bad with their numbers.

The discovery was made around June 22 by one of Mozilla’s Web developers, Stormy Peters, Director of Developer Relations at Mozilla, said in a security advisory posted to the Mozilla Security Blog on Friday.
“Starting on about June 23, for a period of 30 days, a data sanitization process of the Mozilla Developer Network (MDN) site database had been failing, resulting in the accidental disclosure of MDN email addresses of about 76,000 users and encrypted passwords of about 4,000 users on a publicly accessible server,” Peters wrote.

Something's not right with those dates...

PJH

@SecurityWeek said:

June 22

I think, from the OP, that this should read July.

Post dated 1st Aug
... The issue came to light ten days ago ...

delfinom

Wouldn't be surprised if their database consisted of 1 large super table.

DoctorJones

@delfinom said:

super table

He sounds like an appropriate hero for Discourse.

http://psu-rebot.org/media/blogs/rebot/psu-trustees/super-table-logo.jpg

dtech

It's not that hard to understand how they fucked up right?
They dump the db, remove sensitive information in a data sanitization step, dump the file on a publicly accessible web server.
Data sanitization step breaks and voilà, one publicly accessible database with emails and hashes.

PJH

@dtech said:

It's not that hard to understand how they fucked up right?

By keeping the password hashes when they were essentially useless for their intended purpose?

Yup.

dtech

How do you get that?
I think you're interpreting the statement wrong:

According to Peters, the encrypted passwords were salted hashes and they by themselves cannot currently be used to authenticate with the MDN

This is because the salted hashes cannot be easily transformed into the original passwords necessary for authentication (that's the purpose of salted hashes), not because they are useless to store, you still need them server-side while authenticating.

What you do is check if this holds true:
hash(given_password,salt) == stored_salted_hash
where stored_salted_hash = hash(registration_password,salt)

PJH

@dtech said:

How do you get that?

Because the MDN passwords, as were, are no longer being used.

Edit - pressed enter too soon..

https://news.ycombinator.com/item?id=8123781

MDN has been using persona for a while now, meaning that most accounts don't have passwords in the database. But older accounts still had the SHA256 salted hash that Django creates.

I'm well aware of how salted hashes work - I'm pointing out that

@dtech said:

hash(given_password,salt) == stored_salted_hash

isn't happening any more

DoctorJones

@dtech said:

dump the file on a publicly accessible web server

Ah yes, that vital step that is ever present in any data sanitization routine that's worth its salt.

Filed under: well, it's no fun if you don't have a bit of risk