Integer.MAX_VALUE Overflow

zelmak

When I arrived on-site back in late 2009, I poked around a lot to see what was what. After clawing my eyes out at the (lack of) architecture that had grown out of this now 14 year old application which grew organically :pause while the tremors abate: I discovered something that I thought was odd.

In our legacy Oracle-based database application, we use int/Integer for surrogate keys on most of our record types.

Aside: I am retired military "network/system administrator" type although I grew up through the programming side of computers. During the cutover from 1999 to 2000, I was in a NOC that oversaw the communcations network for Southwest Asia (Saudi, Iran, Iraq, Kuwait ... the whole lot.) We were pinging hard about any Y2K issues that may bite us in the butt. But, for once, the foresight of leadership to test and fix all systems for date issues prevailed and we may have had one problem with one non-critical system somewhere in the backwater of the theater of operations. Suffice it to say, a bit of paranoia about these kinds of things crept into my psyche, especially because of my programming background.

I immediately noticed that, in the database, *_IDs were specced out as NUMBER -- essentially, maximum storable by Oracle (38 decimal digits, I think?) Regardless, effectively no limit. But in code, we're limited to Integer.MAX_VALUE (0x7fffffff) or 2 billion IDs ... remember, IDs not necessarily active records. (i.e., count(*_ID) != max(*_ID) ... in fact, the difference between the two numbers is like a factor of 100). For some reason, though, at the time I was stuck on the COUNT() not the MAX(). I monitored COUNT() on a monthly basis and thought we were okay.

For some reason, yesterday, I decided to go back and look at this. I then realized I needed to be looking at MAX() not COUNT(). I ran queries which would find the MAX() *_ID per month for the past 15 months, dropped the results into Excel and drew a trendline.

We've got roughly 18 months until we rollover.

I told my boss. Her eyes get big.

"What's the fix?" I suggest changing the *_ID fields in the Java objects to be long/Long vs int/Integer which should cover us for the forseeable future (probably another 10-13 years).

"How long will it take?" To change the baseline code? A day or so (change it at the data holding object level and fix the ripples as they spiral out.) However, there are other systems who use our .jar files as libraries for database and object access. I don't know how this will affect them.

this_code_sucks

Ick.

I am curious as to why MAX and COUNT are so divergent. The main way I can think of this happening is if your app is doing hard deletes of records instead of soft deletes.

configurator

@this_code_sucks said:

I am curious as to why MAX and COUNT are so divergent

They use 2 checksum digits. Duh!

I've actually seen code that used checksum digits in identity columns that were never user-visible so there was no place for the error to be created. Every id and foreign-key id property would check those digits when assigning an id, meaning every time it was read from the database. Eek.

topspin

@zelmak said:

"What's the fix?" I suggest changing the *_ID fields in the Java objects to be long/Long vs int/Integer which should cover us for the forseeable future (probably another 10-13 years).

It took 15 years to fill up a 32bit int. For it to fill up a 64bit long in another 13 years would mean your data (minus deletes) grows exponentially.
Do you really think that is the case? 2^63 is a really large number.

zelmak

@zelmak said:

"How long will it take?" To change the baseline code? A day or so (change it at the data holding object level and fix the ripples as they spiral out.) However, there are other systems who use our .jar files as libraries for database and object access. I don't know how this will affect them.

Oops .. .way underestimated the ripples ... more like a week or so ...

zelmak

@this_code_sucks said:

Ick.

I am curious as to why MAX and COUNT are so divergent. The main way I can think of this happening is if your app is doing hard deletes of records instead of soft deletes.

You'll love this.

(a) Data rarely gets deleted.

(b) Valid data entering the system receives an ID from an Oracle sequence. This data can be stored in the normal tables for full processing or it can be dropped into filtered tables for trending and analysis (mostly based on whether we've seen like data before -- we're trying to find stuff we haven't seen before.) The ratio of operational versus filtered is like 99 to 1. It might be more, but ...

(c) Filtered data stays on the live operational server for a year. After a year, it gets manually rolled off into a second database, accessible by NONE of our application code for ... a long time. I'm pretty sure these tables are partitioned by 'receive date'.

(d) Deleted data (remember, this happens rarely) will likewise get dropped into a parallel set of tables; however, to better pack the partitioning scheme (by ID of all things), every item entering the deleted side of things gets a new ID from a different Oracle sequence.

Recently (within the last year or so), the powers-that-be wanted a tool that could do a union query between operational data, filtered data (both on and off the operational system) and deleted items so they could do full-up trendings. How do you deal with potentially identical IDs in this union query, I ask?

zelmak

@topspin said:

@zelmak said:
"What's the fix?" I suggest changing the *_ID fields in the Java objects to be long/Long vs int/Integer which should cover us for the forseeable future (probably another 10-13 years).

It took 15 years to fill up a 32bit int. For it to fill up a 64bit long in another 13 years would mean your data (minus deletes) grows exponentially.
Do you really think that is the case? 2^63 is a really large number.

You should be able to see the answer to this in my previous post. Suffice it to say, we're (a) getting more data per hour than last year, (b) better/proper validation is letting more data in, (c) we're letting more and different types of information into the system now.

Aeolun

@zelmak said:

You should be able to see the answer to this in my previous post. Suffice it to say, we're (a) getting more data per hour than last year, (b) better/proper validation is letting more data in, (c) we're letting more and different types of information into the system now.

It still is a humongous amount of increase in input you should have to fill this up in 13 years. As I see it, about 2^31 times as much as you generated over the past 15 years. It might just be me, but that seems like an unreasonable increase, even considering all of the above is true... even considering you'd let ALL data in without deletion, validation or anything.

error

@zelmak said:

@topspin said:
@zelmak said:
"What's the fix?" I suggest changing the *_ID fields in the Java objects to be long/Long vs int/Integer which should cover us for the forseeable future (probably another 10-13 years).

It took 15 years to fill up a 32bit int. For it to fill up a 64bit long in another 13 years would mean your data (minus deletes) grows exponentially.
Do you really think that is the case? 2^63 is a really large number.

You should be able to see the answer to this in my previous post. Suffice it to say, we're (a) getting more data per hour than last year, (b) better/proper validation is letting more data in, (c) we're letting more and different types of information into the system now.

So if you inserted a billion records per second, it would still take over 584 years to exhaust 64 bits. You must collect quite a bit of data.

Ben L.

@joe.edwards said:

So if you inserted a billion records per second, it would still take over 584 years to exhaust 64 bits. You must collect quite a bit of data.

It's only 61 and a half bits after Java^Wtaxes

error

Another way to wrap your head around the magnitude of the number is 2⁶³ = (2³¹)(2³²).

That is to say, since the capacity you are in danger of filling up is 2,147,483,648, you'd have to fill that capacity another 4,294,967,295 times before you ran out. Is your data volume really in danger of increasing another four-billionfold?

Edit: Adjusted for the sign bit, since we're using signed integers.

dhromed

@zelmak said:

You should be able to see the answer to this in my previous post. Suffice it to say, we're (a) getting more data per hour than last year, (b) better/proper validation is letting more data in, (c) we're letting more and different types of information into the system now.

I think you might be underestimating the size of 2^63.

zelmak

@dhromed said:

@zelmak said:
You should be able to see the answer to this in my previous post. Suffice it to say, we're (a) getting more data per hour than last year, (b) better/proper validation is letting more data in, (c) we're letting more and different types of information into the system now.

I think you might be underestimating the size of 2^63.

Ah ... right ... what was I thinking?

Jeez, this place has really muddled my brain. Or I'm getting old.

Or something.

We should be good for the foreseeable future then, eh?

dhromed

Snoofle is totally fucked even with 64 bits, though.

D_Coder

@zelmak said:

@dhromed said:
@zelmak said:
You should be able to see the answer to this in my previous post. Suffice it to say, we're (a) getting more data per hour than last year, (b) better/proper validation is letting more data in, (c) we're letting more and different types of information into the system now.

I think you might be underestimating the size of 2^63.

Ah ... right ... what was I thinking?

Jeez, this place has really muddled my brain. Or I'm getting old.

Or something.

We should be good for the foreseeable future then, eh?

Remember, drinking and converting ints to longs don't mix.

Mcoder

@zelmak said:

Ah ... right ... what was I thinking?

Jeez, this place has really muddled my brain. Or I'm getting old.

Or something.

We should be good for the foreseeable future then, eh?

Oh, ok. I was about to ask if you work at the LHC.