Dithering Algorithm

Lamborghini Dithered

Lamborghini Dithered

An implementation of the Floyd-Steinberg dithering algorithm for very low-end Android devices, whose memory requirements were pretty strict. The phone couldn’t handle all the graphics in the 32-bit 8888 color format, and had to be scaled down to the 16-bit 1555 format. The graphics didn’t look as great as the artists had intended, therefore some form of dithering had to be applied. The maximum image size for the phone was 1024×512. The Floyd-Steinberg algorithm is a very well known dithering algorithm, with many variations that try different combinations of the error propagation matrix. I implemented the simplest one for speed. These versions are written in Java, as they targeted the Android platform, but a C++ port is coming.

I tackled the naive approach first, very flexible, very clean, and very slow. The lowest phone in our testbed, the Galaxy Ace, had to load ~900 images, dither them and load them into video memory. Therefore I inlined everything, removing all convenience function calls. This version uses a double integer array with the indices laid out as [RGBA component][pixel].

For the second approach I tried inverting the indices, to something like [pixel][RGBA component]. It is interesting that with smaller images, such as 800×480, there was a speedup. However, with bigger resolutions such as 1600×1200, performance suffers.

The third approach removes the ifs in the error propagation matrix by padding the destination array with one extra line at the bottom and the sides. The error is written into the matrix and simply discarded later, therefore mitigating the conditional performance penalty.

The last approach uses scanline alternation for a memory and performance optimization. Instead of loading a very big scratch array, it allocates two lines (plus the padding) and alternates between the current and next line. After each line processed, the result is written into the final buffer. There is a small performance advantage, and a very obvious big memory improvement.

Because OpenGL requires POT textures, some textures end up being pretty big. The biggest textures in the program were 512×512 and 1024×512.

For a 512×512 image,

Old 47 ms
Inverted40 ms
If removal33 ms
Scanline30 ms

For a 1024×512 image,

Old54 ms
Inverted67 ms
If removal38 ms
Scanline35 ms

For a 1920×1200 image,

 Approach Time
 Old 104 ms
Inverted107 ms
If removal74 ms
Scanline 70 ms

At 2560×1600,

Old145 ms
Inverted153 ms
If removal111 ms
Scanline99 ms

Note how performance degrades as images grow in size between the first two approaches, when fiddling with the order of the array indices, possibly related to how the Java VM handles them internally. The last two approaches employ a simple 1D array in order to avoid these performance discrepancies. The algorithm becomes a little more involved, but performance becomes more predictable.

All in all, the last algorithm performed ~30% faster than the first one, consuming a negligible amount of memory (2 * (imageWidth + padding) as opposed to imageWidth * imageHeight).

The Eclipse project and source code can be found here.

Comments are closed.