Wednesday, November 21, 2007

Calculating the next power of 2

A fairly common problem in systems programming is to locate the "next highest power of 2," since many data structures are much more efficient that way. Memory pages on a 32-bit OS are usually 4096 bytes long (212), and textures on graphics cards usually need to have power-of-2 dimensions.

So, suppose I have some object of a certain size (947, for example), and I need to find the smallest power-of-2 that's large enough to hold my object (in this case, 210 or 1024). We'll start with a fairly simple algorithm:

int val = 947, powof2 = 1;
// Double powof2 until >= val
while( powof2 < val ) powof2 <<= 1;

This will work, and if you're in a hurry, this is probably what will get used. If you're just calculating the size for a texture, then this is unlikely to be performance-critical code, so it may not be worth your time to optimize this. This might become a bottleneck if you're writing a memory manager, however; how can you do this more quickly?

This code has a few weaknesses. Firstly, it uses a loop, which means that the code has to run a test and then split in two possible directions after every iteration. Conditional branches of this sort aren't a huge deal on a modern processor, but they're still slower than avoiding branches altogether. Secondly, the loop can iterate quite a few times--up to 31, in fact. Thirdly, if the value stored in 'val' is over 231, and if you're using unsigned integers, then this can become an infinite loop. Hardly desirable!

It turns out that there's a fairly slick algorithm to solve this problem. The basic idea is to take your number (947 again: 0000 0011 1011 0011), and use bit operations to convert it to a solid string of 1's with the same number of digits (in this case, 0000 0011 1111 1111). Then, add 1, and you'll have the next power of 2 (0000 0100 0000 0000, or 1024). To convert any number into a solid string of 1's, we'll use right-shift and or repeatedly:

val = (val >> 1) | val;
val = (val >> 2) | val;
val = (val >> 4) | val;
val = (val >> 8) | val;
val = (val >> 16) | val;

Why do we do it five times? Well, that's because val is a 32-bit integer, and every time you iterate, you're doubling the number of digits you fill. After five steps, you've filled 25 digits, and you're done. If you're on a 64-bit architecture (26) you'll need to repeat 6 times. The steps on 947 look like:

0000 0011 1011 0011
0000 0011 1111 1011
0000 0011 1111 1111
... with the remaining steps staying at this value.

There's one small problem with this algorithm: it doesn't work properly for the powers of two themselves. It will give the next highest power of 2. This is pretty easy to fix, though: Just subtract one from the value before you begin, and you'll get the right answer in every case. Here's the final algorithm:

val = ... // Get input
val--;
val = (val >> 1) | val;
val = (val >> 2) | val;
val = (val >> 4) | val;
val = (val >> 8) | val;
val = (val >> 16) | val;
val++; // Val is now the next highest power of 2.

One final thought: What does it do on zeroes? It turns out that if you pass the algorithm a zero or any value greater than 231, it will return zero (try it if you're not sure why). Unless you're certain you will never pass either of these values, you should handle having zero as a return value.

That's all for today. Happy bit-twiddling!

Breaking return-by-value recursive relationships

One of the classic problems in C++ is when you have two tightly interlocking classes, and you want to do something like this:

class A
{
B getB();
};

class B
{
A getA();
};

There are plenty of cases where something like this might come up. I ran into it because I was working with a topological mesh--I wanted Edges to return adjacent Faces, and Faces to return adjacent Edges. The problem with the above situation is that it's impossible. C++ won't allow you to return an undefined type by value, because it doesn't know the size of the undefined type and therefore can't compute the function signature. The classic solution is usually to do it by reference:

class A;
class B;

class A
{
B &getB();
};

class B
{
A &getA();
};

That works in cases where A and B refer to something which exists in some third location, but if A and B are constructed by the function itself, you're stuck again. So the third option is to return a pointer:

class A;
class B;

class A
{
B *getB();
}

class B
{
A *getA();
}

B *A::getB() { return new B; }
A *B::getA() { return new A; }

So far so good--except that now the caller of the get functions owns the memory, and we've taken a (potentially enormous) performance hit by touching the heap. We could alleviate the problem of memory ownership by using smart pointers, although that would likely double the performance hit, and is frankly an absurdly complicated solution for what ought to be a really simple problem.

The good news is that there exists a better way. The bad news is that it uses templates, which means that the code looks a lot like punctuation soup.

The basic idea is to have both A and B be specializations of a single template. Template specializations are interesting beasts, because there is no requirement that they have anything in common (as opposed to, say, class inheritance, which always packages everything from the base class into the child classes). The template parameter will be a dummy integer, and will be used purely to give A and B different type names. I'll use some typedefs to keep things readable, but they are optional. Here's how it looks:

template< int T > class BaseTemplate {};
typedef BaseTemplate< 0 > A;
typedef BaseTemplate< 1 > B;
// A
template<> class BaseTemplate< 0 >
{
public:
BaseTemplate() {} // A constructor
B getB();
}

// B
template<> class BaseTemplate< 1 >
{
public:
BaseTemplate() {} // B constructor
A getA();
}

inline B A::getB() { return A(); }
inline A B::getA() { return B(); }

This code will work! So, why does it work? The reason has to do with how templates are compiled. Templates delay the creation of function signatures until you actually use the template somewhere. This means that neither getA() nor getB() will have their signatures analyzed until after both classes A and B have already been fully declared. That's the magic of this method.

Unfortunately, there are a few inconvenient caveats associated with this trick. You can't define getB() inside the class, because B hasn't been declared yet. Attempting to declare it in the class will result in an error from using an undefined type. You can avoid these issues by always declaring all the member functions at the end of the header file, after all your interlocking classes are defined. Secondly, since this is a template, you must include all the member function definitions in the header file. If your class is complex, that can result in some rather ugly header file dependencies. You can work around that by having your member functions call through to a worker function as necessary:

B A_getB( A &a );
inline B A::getB() { return A_getB(*this); }

The worker function can be defined in a separate file, which can help break the header-file dependency loops. You'll probably need to friend the worker functions as well. You may have quite a lot of typing on your hands if you go this way.

So, is it worth the effort? That's going to depend on your project. In my case, I loved the results:

return my_edge.next().opposite().face(0).getVert(0);

This calls "next" on my_edge, which returns an edge; it then calls "opposite," which gets the edge on the other side, and then "face" which gets a face next to the edge. Finally, it gets vertex 0 from that face. Faces can return edges and vertices, edges can return faces and vertices, and vertices can return faces and edges. It's much nicer.

Inaugural Snippet

About every week or so, I run across interesting programming snippets while looking for solutions to problems at work. These are usually in C++, but other amusing languages might show up as well. Some of them are pretty cool, so I figured it would be more fun to blog them than to just sit on them. For a starting post, though, it seems most appropriate to do something like this:


#include <iostream>

int main()
{
std::cout << "Hello, Snippets World!" << std::endl;
return 0;
}