The Art Of Packing Data

The packing of data is good practice for many reasons, including disk space and efficient RAM or cache access. If we know the meaning of data we can often narrow down the range and precision, making informed decisions as to the amount of bytes we need. I was inspired once by this article and here’s my take on the topic. We’ll explore common ways of packing certain kinds of data common in videogames, their possible implementation and rationale; worthy of note is that this is not an article about compression. I’ll be using HLSL syntax but this will look very familiar to C++ and can be ported easily to any other language.

Normalized Data

This is the simplest type of data to pack so we’ll start here. Normalized data ranges from 0 to 1. You can easily normalize data by shifting and dividing by its maximum value. This mostly applies to colors or bounded values (think a shadow or transparency term) and sometimes normalized vectors, although there are better methods as we’ll see later. The D3D12 formats for this kind of data are the _UNORM class, such as R8G8B8A8_UNORM or R16G16_UNORM. The code examples below show how to encode a typical color with alpha into 8 and 16 bits, they are common cases but you can make as many variations as needed depending on the bitrate and the data you want to store.

Note how we add 0.5 to the result after multiplying by 255. This operation followed by casting is equivalent to rounding but avoids the round instruction since the add gets factored into the multiply-add. Some of these operations are so common that many closed platforms have intrinsics or special instructions to encode and decode bits. Recently, HLSL added some special packing instructions to Shader Model 6.6 so we can also write the RGBA8 packing as follows.

We’ll stop here for a minute to analyze the RDNA bytecode generated from these instructions. I have grouped them to make logical sense as the compiler is free to reorder these. These tests were performed on the Radeon Graphics Analyzer using the 1103 RDNA3 ASIC in offline mode. We need to be careful as older RGA versions produce worse than the baseline, whereas the latest one I used here shows an improvement. As always, measure and make sure! The command line I used, should you wish to replicate the results, is .\rga.exe -s dx12 -c gfx1103 –offline –cs example.hlsl –cs-entry CSMain –cs-model cs_6_6 –dxc-opt –isa example_hlsl_v1.txt

As you can see the compiler is able to improve our hand-written logic and squeeze a couple extra instructions for our packing using v_perm_b32, an instruction that swizzles values into a single one. We don’t have the high-level instructions to perform the same operations manually which is unfortunate. There are other normalized formats commonly used in videogames that don’t have the same bit width for all components, for example R5G6B5, R5G5B5A1 or R10G10B10A2 formats. We can see how to encode and decode one of them below.

Continue reading