A Macro View of Nanite

After showing an impressive demo last year and unleashing recently with the UE5 preview, Nanite is all the rage these days. I just had to go in and have some fun trying to figure it out and explain how I think it operates and the technical decisions behind it using a renderdoc capture. Props to Epic for being open with their tech which makes it easier to learn and pick apart; the editor has markers and debug information that are going to be super helpful.

This is the frame we’re going to be looking at, from the epic showdown in the Valley of the Ancient demo project. It shows the interaction between Nanite and non-Nanite geometry and it’s just plain badass.

Nanite::CullRasterize

The first stage in this process is Nanite::CullRasterize, and it looks like this. In a nutshell, this entire pass is responsible for culling instances and triangles and rasterizing them. We’ll refer to it as we go through the capture.

Instance Culling

Instance culling is one of the first things that happens here. It looks to be a GPU form of frustum and occlusion culling. There is instance data and primitive data bound here, I guess it means it culls at the instance level first, and if the instance survives it starts culling at a finer-grained level. The Nanite.Views buffer provides camera info for frustum culling, and hierarchical depth buffer (HZB) is used for occlusion culling. The HZB is sourced from the previous frame and forward projected to this one. I’m not sure how it deals with dynamic objects, it may be that it uses such a large mip (small resolution) that it is conservative enough. EDIT: According to the Nanite paper, the HZB is generated this frame with the previous frame’s visible objects. The HZB is tested with the previous objects as well as anything new and visibility updated for the next frame.

Both visible and non-visible instances are written into buffers. For the latter I’m thinking this is the way of doing what occlusion queries used to do in the standard mesh pipeline: inform the CPU that a certain entity is occluded and it should stop processing until it becomes visible. The visible instances are also written out into a list of candidates.

Persistent Culling

Persistent culling seems to be related to streaming. It is a fixed number of compute threads, suggesting it is unrelated to the complexity of the scene and instead maybe checks some spatial structure for occlusion. This is one complicated shader, but based on the inputs and outputs we can see it writes out how many triangle clusters are visible of each type (compute and traditional raster) into a buffer called MainRasterizeArgsSWHW (SW:compute, HW:raster).

Clustering and LODding

It’s worth mentioning LODs at this point as it is probably around here where those decisions are made. Some people speculated geometry images as a way to do continuous LODding but I see no indication of this. Triangles are grouped into patches called clusters, and some amount of culling is done at the cluster level. The clustering technique has been described before in papers by Ubisoft and Frostbite. For LODs, clusters start appearing and disappearing as the level of detail descends within instances. Some very clever magical incantations are employed here that ensure all the combinations of clusters stitch into each other seamlessly.

Continue reading