Computer Cognitions: Integer Intrigue

Today I ran into a problem when trying to run an SQL insertion query into a table that had columns like the following:

database.customer:

cu_id: int(11)
cu_otherID: int(11)
cu_name: varchar(255)
...

Some insertions would work and others did not even though the data was coming from the same source. I managed to narrow the problem down to the cu_otherID field. I noticed that in a successful query, the value was inserted as-is (for example, the value 1005423522) but in an unsuccessful query the value inserted was not the same as the value from the source data (for example, the value 5442231112). For some reason, the cu_otherID for the unsuccessful queries was always 2147483647.

You see, the thing about integers is that they take up space in memory; 32 bits in this case. The first bit is known as the sign bit, which determines if the value is positive or negative. When the sign bit comes into play, the integer is known as a signed integer and has a maximum value of 2147483647. When the sign bit is not used, the integer is called an unsigned integer and the maximum value of a 32 bit unsigned integer is 4294967295.

Here's an illustration of a 32-bit integer in memory:

Image Source

So, back to the issue. It's obvious now that the insertion is putting in the maximum int value for the cu_otherID because the value from the source data is above the max. Making the int unsigned would give a higher threshold, but it still doesn't cut it for many of the values. In this case, I defaulted to changing the cu_otherID field into varchar like so:

ALTER TABLE customer MODIFY cu_otherID varchar(255);

Now the values in the cu_otherID column can be strings of up to 255 characters. No worries about limits there!

4 comments:

HuguesJuly 25, 2012 at 5:55 AM
That would prevent the integer overflow problem but if you ever need to do a JOIN on cu_otherID the performance will suffer. Did you try a BIGINT? That can hold the number of seconds that the universe has existed with 500 billion years to spare :)
UnknownJuly 25, 2012 at 6:43 PM
I thought about that. Would there be a massive over-allocation of storage for all of those BIGINTs since the ids wouldn't be more than 12 digits long?
ChrisAugust 9, 2012 at 8:03 PM
A BIGINT is 8 bytes. Storing each number as a varchar means that each digit is going to take up a byte, which in your examples would be 10 bytes -- so you'd actually save space by using the BIGINT.

Also, the illustration represents the format of a floating point number in memory, not an integer. ;-)

Tuesday, July 24, 2012

Integer Intrigue

4 comments: