A Very Specific Number Ate My Homework
-
The way we store student coding activity is in a table that until today had a 32-bit index. What this means is that the database table could only store 4 billion rows of coding activity information.
-
-
@boomzilla said in A Very Specific Number Ate My Homework:
The way we store student coding activity is in a table that until today had a 32-bit index. What this means is that the database table could only store 4 billion rows of coding activity information.
And we didn't notice we had almost 4 billion rows in that table until it failed.
-
@HardwareGeek said in A Very Specific Number Ate My Homework:
And we didn't notice we had almost 4 billion rows in that table until it failed.
To be fair, without an error calling attention to it, how often do you check your database for things like that?
-
4 billion rows should be enough for anyone!
-
@masonwheeler said in A Very Specific Number Ate My Homework:
how often do you check your database for things like that?
Well, in the little database stuff I've done, I've never had to deal with tables anywhere near that big, by several orders of magnitude, so, um, point to you, I guess.
-
@HardwareGeek They forgot to make their database WEBSCALE, I guess.
-
@masonwheeler said in A Very Specific Number Ate My Homework:
@HardwareGeek said in A Very Specific Number Ate My Homework:
And we didn't notice we had almost 4 billion rows in that table until it failed.
To be fair, without an error calling attention to it, how often do you check your database for things like that?
Regularly, because my very first attempt at a guestbook showed what happens when you use a
tinyint
as the unique index.I'm not sure what I was thinking. Then again, it made damn sure that the database didn't grow too big.
-
@boomzilla
IMO, If they had a competent database person, they would have recognized the death of the database was imminent from the size of the key and the # of people saving. There's really no excuse; I've noticed that sort of problem several times while working in databases. It should be one of the top things to check!
-
Sure, but they saved a whopping (232 - 1)*4 bytes = 17GB.
That's like, $0.42 per month on AWS prices.
-
@darkmatter said in A Very Specific Number Ate My Homework:
@boomzilla
IMO, If they had a competent database person, they would have recognized the death of the database was imminent from the size of the key and the # of people saving. There's really no excuse; I've noticed that sort of problem several times while working in databases. It should be one of the top things to check!40 million students * 100 exercises.
-
@masonwheeler said in A Very Specific Number Ate My Homework:
To be fair, without an error calling attention to it, how often do you check your database for things like that?
Never, but I use a database that doesn't have that limitation. :p
-
@anonymous234 said in A Very Specific Number Ate My Homework:
Sure, but they saved a whopping (232 - 1)*4 bytes = 17GB.
A fan of hardware sizes, I see.
-
@dkf said in A Very Specific Number Ate My Homework:
@masonwheeler said in A Very Specific Number Ate My Homework:
To be fair, without an error calling attention to it, how often do you check your database for things like that?
Never, but I use a database that doesn't have that limitation. :p
Which one?
-
@Dreikin said in A Very Specific Number Ate My Homework:
Which one?
SQLite. It doesn't use a fixed number of bytes for integers. (You might have a problem if you put more than 64 bits worth of values in your table, but that's a Shit-Ton of Data anyway.) I also use language bindings that don't magically restrict things to 32-bits either.
Since you asked. :)
-
@dkf I think oracle is limited to 10^20 for integers.
-
@PleegWat said in A Very Specific Number Ate My Homework:
@dkf I think oracle is limited to 10^20 for integers.
correct, you need an advanced enterprise license for any number bigger than that
-
@PleegWat (2^31)-1 for INTEGER, (2^63)-1 for LONGINTEGER. IIRC NUMBER is 10^125.
-
@dkf said in A Very Specific Number Ate My Homework:
You might have a problem if you put more than 64 bits worth of values in your table, but that's a Shit-Ton of Data anyway.
Guess who else thought "well, that's bigger than I'll ever realistically need".
-
@uschwarz said in A Very Specific Number Ate My Homework:
@dkf said in A Very Specific Number Ate My Homework:
You might have a problem if you put more than 64 bits worth of values in your table, but that's a Shit-Ton of Data anyway.
Guess who else thought "well, that's bigger than I'll ever realistically need".
263 nanoseconds is 300 years. If you're recording a data point every nanosecond continuously for 300 years, you'll run out of IDs. To put that another way, if every living person on Earth entered a data point once per second, it would take 36½ years for you to run out of IDs.
If you're still worried about running out of IDs, switch to 128-bit IDs. Assuming you always leave the top bit zero, entering a data point every nanosecond would take 390000000000 times the age of the universe to run out, and just the IDs would take up 2,475,880,078,570,760,549,798,248,448 TiB of disk space.
-
@ben_lubar said in A Very Specific Number Ate My Homework:
If you're still worried about running out of IDs, switch to
128-bit IDs.UUIDsIt's a little-known fact that UUIDs actually guarantee uniqueness by making use of the quantum properties of the space-time. Whenever you generate a colliding UUID, the universe it's generated in is instantly destroyed, thus preserving their uniqueness.
A draft release of Multiverse Unique Identifiers is available, but is not considered production-ready due to the limited availability of spare multiverses.
-
@ben_lubar said in A Very Specific Number Ate My Homework:
263
Assuming you always leave the top bit zero
But ... why?
-
@cvi said in A Very Specific Number Ate My Homework:
But ... why?
Avoids having to specify if the values are signed or not.
-
@dkf said in A Very Specific Number Ate My Homework:
Avoids having to specify if the values are signed or not.
But if you're using them as unique indices that doesn't matter? I mean, as long as it's unique, whether you pretend that the value is -1 or UINT*_MAX (or something else) doesn't matter. After all, it just needs to be an unique value. No point in throwing away half of your id space.
-
@cvi
Well yes, but 36½ years from now, @ben_lubar will be retired on a tropical island with Internet 100x faster than Milwaukee PC, and his poor successor can worry about updating the index to wrap-around to negative numbers so that the table doesn't run out of space.
-
@izzion said in A Very Specific Number Ate My Homework:
on a tropical island with Internet 100x faster than Milwaukee PC
A tin can and a (very) wet string?
-
@izzion it's okay, the PC that database is running on will succumb to the y2.038k bug well before it runs out of index space.
-
@HardwareGeek said in A Very Specific Number Ate My Homework:
@izzion said in A Very Specific Number Ate My Homework:
on a tropical island with Internet 100x faster than Milwaukee PC
A tin can and a (very) wet string?
I believe what he was saying was that the tropical island didn't have internet access.
-
@ben_lubar that does it, I'm setting the seed value for all new 32bit integer keys to -2,147,483,648 to double the time it will work.
-
@darkmatter said in A Very Specific Number Ate My Homework:
@ben_lubar that does it, I'm setting the seed value for all new 32bit integer keys to -2,147,483,648 to double the time it will work.
But how will you save error rows to the table?
-
@darkmatter said in A Very Specific Number Ate My Homework:
@ben_lubar that does it, I'm setting the seed value for all new 32bit integer keys to -2,147,483,648 to double the time it will work.
All our bots are given negative IDs for distinguishment. Because the only thing different in our game (from the game-state's viewpoint) is that bots have no strings attached to their actors.
Filed under: There are no strings on me!
-
@cvi said in A Very Specific Number Ate My Homework:
@dkf said in A Very Specific Number Ate My Homework:
Avoids having to specify if the values are signed or not.
But if you're using them as unique indices that doesn't matter? I mean, as long as it's unique, whether you pretend that the value is -1 or UINT*_MAX (or something else) doesn't matter. After all, it just needs to be an unique value. No point in throwing away half of your id space.
Half of the ID space isn't much compared to the difference between 31 and 63 bits, and it's definitely less than the difference between 31 and 127 bits.
-
@ben_lubar said in A Very Specific Number Ate My Homework:
Half of the ID space isn't much compared to the difference between 31 and 63 bits
Half of 264 is larger than 263−231.
-
@Dreikin said in A Very Specific Number Ate My Homework:
@ben_lubar said in A Very Specific Number Ate My Homework:
Half of the ID space isn't much compared to the difference between 31 and 63 bits
Half of 264 is larger than 263−231.
It is larger, but still on the same magnitude, since it's exactly 263 (and 231 is way smaller than 263).
-
@darkmatter said in A Very Specific Number Ate My Homework:
well before it runs out of index space
More to the point, if you're going to store anything else in those tables as well as IDs, you're going to need some more disk space.
-
@Dreikin sounds like a problem for captain boolean column. by that I mean adding a char(32) field to hold 32 booleans stored as integers in a string made to look like binary bitmask for enterprisey flag expansion.
-
@darkmatter
Wouldn't an enterprisey bitmask wind up only fitting 30 booleans into a CHAR(32)? And to be truly enterprisey, shouldn't it be NCHAR(32)?Filed under: 0x000100010001000100010001000100
-
@RaceProUK Is the sideways wang intentional?
-
@izzion said in A Very Specific Number Ate My Homework:
Wouldn't an enterprisey bitmask wind up only fitting 30 booleans into a CHAR(32)? And to be truly enterprisey, shouldn't it be NCHAR(32)?
Accept nothing less than JSONX columns.
-
@izzion i'm just mocking a real thing that happened long ago on a system I work with now (though not for much longer). It gets better too...
They originally had a 16 character field into which they stored binary strings like so:
'0001001001010010'Eventually they came up with more than 16 things to store, so they doubled the size to 32 bits... but half of the code used the flags field as a string and thus used string indexing, so they couldn't store the binary directly into the field anymore, they had to split it into two 16 bit sections and add the most significant 16 bits on the right side of the lsbs. Which led to all their docs referencing the 32 bits in the database as being numbered thusly:
15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0,31,30,29.....18,17,16
requiring reading it out, splitting in half to attach the back to the front and then casting to binary....while in certain code it was referenced like this: 0,1,2,3,4,5...31 with substring.
Of course, they could have updated the string versions to reference the correct bit, but it was easier to fix on the data-save/read layer than the business logic layer.
-
@darkmatter That is the most boring thing I have ever read.
-
@lucas1 TL;DR version - the binary bitmasks were jacked up by usage as strings by devs that don't know how to code.
-
@darkmatter Were there tits of fannies involved?
-
@lucas1 no - but the database actually supported a BINARY field type... which made the storage of a binary cast to string in a char(16) field about a million times dumber than the regular kind of dumb that it looks on first glance.
If they'd used the binary type in the first place, it would not only have taken less space, it would have avoided people thinking it was a string to use.
-
@darkmatter I think the lesson to learn here is that if they'd used VARCHAR(MAX) for their key, they could have stored 2GB worth of binary in ascii form, yielding a key space of 2^2147483648 - which would have kept them going for a very very long time?
So maybe the join times get a little slow... throw more processor & hdd scaling at it!
-
@lucas1 said in A Very Specific Number Ate My Homework:
@darkmatter That is the most boring thing I have ever read.
You should read what you write before posting it, then.