Wednesday, November 21, 2007

Calculating the next power of 2

A fairly common problem in systems programming is to locate the "next highest power of 2," since many data structures are much more efficient that way. Memory pages on a 32-bit OS are usually 4096 bytes long (212), and textures on graphics cards usually need to have power-of-2 dimensions.

So, suppose I have some object of a certain size (947, for example), and I need to find the smallest power-of-2 that's large enough to hold my object (in this case, 210 or 1024). We'll start with a fairly simple algorithm:

int val = 947, powof2 = 1;
// Double powof2 until >= val
while( powof2 < val ) powof2 <<= 1;

This will work, and if you're in a hurry, this is probably what will get used. If you're just calculating the size for a texture, then this is unlikely to be performance-critical code, so it may not be worth your time to optimize this. This might become a bottleneck if you're writing a memory manager, however; how can you do this more quickly?

This code has a few weaknesses. Firstly, it uses a loop, which means that the code has to run a test and then split in two possible directions after every iteration. Conditional branches of this sort aren't a huge deal on a modern processor, but they're still slower than avoiding branches altogether. Secondly, the loop can iterate quite a few times--up to 31, in fact. Thirdly, if the value stored in 'val' is over 231, and if you're using unsigned integers, then this can become an infinite loop. Hardly desirable!

It turns out that there's a fairly slick algorithm to solve this problem. The basic idea is to take your number (947 again: 0000 0011 1011 0011), and use bit operations to convert it to a solid string of 1's with the same number of digits (in this case, 0000 0011 1111 1111). Then, add 1, and you'll have the next power of 2 (0000 0100 0000 0000, or 1024). To convert any number into a solid string of 1's, we'll use right-shift and or repeatedly:

val = (val >> 1) | val;
val = (val >> 2) | val;
val = (val >> 4) | val;
val = (val >> 8) | val;
val = (val >> 16) | val;

Why do we do it five times? Well, that's because val is a 32-bit integer, and every time you iterate, you're doubling the number of digits you fill. After five steps, you've filled 25 digits, and you're done. If you're on a 64-bit architecture (26) you'll need to repeat 6 times. The steps on 947 look like:

0000 0011 1011 0011
0000 0011 1111 1011
0000 0011 1111 1111
... with the remaining steps staying at this value.

There's one small problem with this algorithm: it doesn't work properly for the powers of two themselves. It will give the next highest power of 2. This is pretty easy to fix, though: Just subtract one from the value before you begin, and you'll get the right answer in every case. Here's the final algorithm:

val = ... // Get input
val--;
val = (val >> 1) | val;
val = (val >> 2) | val;
val = (val >> 4) | val;
val = (val >> 8) | val;
val = (val >> 16) | val;
val++; // Val is now the next highest power of 2.

One final thought: What does it do on zeroes? It turns out that if you pass the algorithm a zero or any value greater than 231, it will return zero (try it if you're not sure why). Unless you're certain you will never pass either of these values, you should handle having zero as a return value.

That's all for today. Happy bit-twiddling!

7 comments:

Clemens said...

Hey,

another possibility is to first convert the number to a double, then doing

double vald = (double)val;
int next_pow = (int)pow(ceil(log2(vald)),2);

The idea is: We compute the 2-base logarithm of our number, say it's x. Then we now pow(2,x) will give the number itself. Rounding x to the next integer with ceil(x), pow will give the next integer power of 2. Yay :)

Unknown said...

@Clemens

Thank you, thats what im looking for :D

Sayantan said...

This algorithm doesn't work if the input number is already a power of 2.

E.g. if input number is 32, i would expect 64 as the output. But it always returns 32.

Nate Fries said...

Sayantan, use the unfinished one to get the results you want.

Nate Fries said...

Also, to explain his final notes to those not familiar with how computers store numbers and/or do math:

I'll convert math down to the bit-level. I assume that you know C's bitwise operators. I'm using an imaginary primitive to represent bits and an imaginary macro to get a bit from any integer. For those with no imagination, I have included macros to make them real.

#if (YOU_HAVE_NO_IMAGINATION)
#define bit int
#define getbit(var, num) (((var)&1<<(num))>>(num))
#endif

bit subbits(bit a, bit b, bit* carry)
{
/* The difference between the bits */
bit diff = (a ^ b);
/* Factor in the carry */
bit result = (diff ^ *carry);
/* Calculate the next carry */
*carry = ((~a & b) | (~diff & carry));
return result;
}

byte subbytes(byte a, byte b)
{
/* carry is intially zero */
bit carry = 0;
byte result = 0;
/* each bit in the byte */
for(int i = 0; i < 8; ++i)
{
/* get the corresponding bit */
bit ba = getbit(a, i);
bit bb = getbit(b, i);
/* do the subtraction */
result |= subbits(a, b, &carry) << i;
}
return result;
}

bit addbits(bit a, bit b, bit* carry)
{
/* The difference between the bits */
bit diff = (a ^ b);
/* Factor in the carry */
bit result = (diff ^ *carry);
/* Calculate the next carry */
*carry = ((a & b) | (diff & carry));
return result;
}

byte addbytes(byte a, byte b)
{
/* carry is intially zero */
bit carry = 0;
byte result = 0;
/* each bit in the byte */
for(int i = 0; i < 8; ++i)
{
/* get the corresponding bit */
bit ba = getbit(a, i);
bit bb = getbit(b, i);
/* do the subtraction */
result |= addbits(a, b, &carry) << i;
}
return result;
}

If you build and run these functions (assuming of course they build; haven't tested, but it should be suitable as example still), you'll notice that adding 1 to 11111111 will produce 0 and subtracting 1 from 0 will produce 11111111. You'll also notice that the variable named "carry" is ignored after the 8th bit (i==7). Computer hardware does the same thing, it ignores the carry bit (in the case of intel processors, its actually a bit in the EFLAGS register). When you do math in your head, you can carry beyond the original number of digits in the number. The computer is not capable of carrying beyond the number of bits in the number. In Java, I believe the result would be an overflow exception. In C and C++, however, the number's value silently shifts from 0 to 11111111 or vice-verse.

...perhaps the code snippet is not required for the explanation? Oh well. :P

Wallacoloo said...

@Clemens - You can use bit shifting to avoid the call to pow. int next_pow = 1 << ceil(log2((double)val));

Adam said...

Just a note to @Clemens -- Yes, that method should work, but how fast is it? You're going to take a speed hit for moving into and out of the floating point registers. I haven't checked, but I'm guessing the method in the post is going to win on speed.

Also: This is a very, very late reply (four years later!)