Temporal AA and the quest for the Holy Trail

Long gone are the times where Temporal AA was a novel technique, and slowly more articles appear covering motivations, implementations and solutions. I will throw my programming hat into the ring to walk through it, almost like a tutorial, for the future me and for anyone interested. I am using Matt Pettineo’s MSAAFilter demo to show the different stages. The contents come mostly from the invaluable work of many talented developers, and a little from my own experience. I will introduce a couple of tricks I have come across that I haven’t seen in papers or presentations.

Sources of aliasing

The origin of aliasing in CG images varies wildly. Geometric (edge) aliasing, alpha testing, specular highlights, high frequency normals, parallax mapping, low resolution effects (SSAO, SSR), dithering and noise all conspire to destroy our visuals. Some solutions, like hardware MSAA and screen space edge detection techniques, work for a subset of cases but fail in different ways. Temporal techniques attempt to achieve supersampling by distributing the computations across multiple frames, while addressing all forms of aliasing. This stabilizes the image but also creates some challenging artifacts.

Jitter

The main principle of TAA is to compute multiple sub-pixel samples across frames, then combine those together into a single final pixel. The simplest scheme generates random samples within the pixel, but there are better ways of producing fixed sequences of samples. A short overview of quasi-random sequences can be found here. It is important to select a good sequence to avoid clumping, and a discrete number of samples within the sequence: typically between 4-8 work well. In practice this is more important for a static image than a dynamic one. Below a pixel with 4 samples.

To produce random sub-samples within a pixel we translate the projection matrix by a fraction of a pixel along the frustum plane. The valid range for the jitter offset (relative to the pixel center) is half the inverse of the screen dimension in pixels, so \begin{bmatrix}\dfrac{-1}{2w},\dfrac{1}{2w}\end{bmatrix} and \begin{bmatrix}\dfrac{-1}{2h},\dfrac{1}{2h}\end{bmatrix}. We multiply the offset matrix (just a normal translation matrix) by the projection matrix to get the modified projection, as shown below.

 

\begin{pmatrix} \dfrac{2n}{w} & 0 & 0 & 0\\ 0 & \dfrac{2n}{h} & 0 & 0\\ 0 & 0 & \dfrac{f}{f-n} & 1\\ 0 & 0 & \dfrac{-f·n}{f-n} & 0\\ \end{pmatrix} · \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ j_x & j_y & 0 & 1\\ \end{pmatrix}= \begin{pmatrix} \dfrac{2n}{w} & 0 & 0 & 0\\ 0 & \dfrac{2n}{h} & 0 & 0\\ j_x & j_y & \dfrac{f}{f-n} & 1\\ 0 & 0 & \dfrac{-f·n}{f-n} & 0\\ \end{pmatrix}

 

Once we have a set of samples, we use this matrix to rasterize geometry as normal to produce the image that corresponds to the sample. If it all works well and every frame you get a new jitter, the image should look wobbly like this.

Resolve

The next stage in our TAA journey is the resolve pass. We’ll collect the samples and merge them together. Resolve passes can take two forms, either using an accumulation buffer or several past buffers, like Guerrilla. For this article we’ll stick to the first, as it’s more common and stable. The accumulation buffer stores the result of multiple frames, and gets updated every frame by blending a small percentage (e.g. 10%) of the current, jittered, frame. This should be enough for a static camera. The image still shows some specular aliasing we’ll address later, but it’s stable (it’s an animated webp).

So far so good. What happens if we jiggle the camera about a little?

Ghosting

The reason we get trails is that we’re sampling the previous frame at the same position as the current frame. The result is a superposition of images that fade away as we accumulate new frames. Since the issue is introduced by moving the camera, let’s tackle that first. Camera motion is relatively simple to fix for opaque objects because we know their world space positions can be reconstructed using the depth buffer and the inverse of the camera projection. For more detail read here and here, or consult the demo BackgroundVelocity.hlsl. This process is called reprojection, and involves the following steps:

  1. Read depth from current depth buffer produced by current camera C
  2. Backproject using the inverse of the view-projection matrix, to transform our screen space position into world space
  3. Use previous view-projection matrix to project onto previous camera P‘s screen space
  4. Transform screen space position to UV and sample the accumulation texture

The devil, as always, is in the details and there are many things to take into account such as the position being outside the previous camera, viewport changes if you have dynamic resolution, etc.

An alternative is to store this reprojection offset as velocity in a texture. We’ll see why this is useful later. When storing the velocity we need to make sure the jitter offset is removed, as we’ll see in the motion vector section.

The result is a little blurry but the ghosting has disappeared. Or has it? Let’s try translating the camera now.

Disocclusion

As objects move relative to one another, surfaces that weren’t previously visible may come into view; we call that disocclusion. In the image above, moving the camera sideways reveals part of the background that was previously occluded by the model. Note that it looks like it’s the hand that moves, because the background is static. Those two movements are not equivalent, as we’ll see later. The newly revealed surface will correctly reproject itself but encounter invalid information in the accumulation buffer from the model that was previously there. There are multiple ways to address this issue.

Color Clamping

Color clamping makes the assumption that colors within the neighborhood of the current sample are valid contributions to the accumulation process. A value sourced from the accumulation buffer that diverges greatly should in theory be discarded. However, rather than throwing the value away and resetting the accumulation process, we adjust it to fit it in the neighborhood and let it through. There are different techniques, but three popular ones are clamp, clip and variance clipping. Shown below in purple is an example of a 3×3 neighborhood. Implementations for different techniques can be found courtesy of Playdead here and their presentation Temporal Reprojection Antialiasing in INSIDE, as well as UE4’s High Quality Temporal Supersampling.

To more visually represent this algorithm in action I created a little program in Unity that takes a few positions (the value of the position is the color), creates colored spheres (the neighborhood), derives a box from it, takes a history sample and clamps it to that box. It’s easier to see it in 2D. You can appreciate how vastly different colors get approximated to something resembling the original colors.

Any variation of this is a must in a TAA implementation. If the neighborhood has a lot of color variance in it, the bounding box becomes huge and trailing can become apparent again. For that we’ll need extra information. Here’s what clamp looks like.

Depth Rejection

The idea behind depth rejection is that we can assume that pixels with very different depth values belong to different surfaces. For this we need to store the previous frame’s depth buffer. This can work well for first person shooters, where the gun and the environment are very far apart. However, it isn’t a universal heuristic, and can go wrong in multiple scenarios, for example foliage or noisy geometry with a lot of depth complexity. For use cases, see:

Stencil Rejection

Stencil rejection is a bespoke solution that can work well for a limited set of content. The idea is to tag “special” objects with a stencil value that is different to the background. This could be the main character, a car, etc. For this we need to store the previous frame’s stencil buffer. When doing the resolve, we discard any surfaces with different stencil values. Special care needs to be taken to avoid hard edges. For use cases, see:

Update: a similar scheme, mentioned by a kind reader on Twitter, can be implemented using an ID buffer.

Velocity Rejection

Rejecting surfaces based on velocity is in my opinion more robust, as by definition disocclusion arises from the difference in relative motion with respect to the camera between two objects. If two surfaces have very different velocities across two frames then either the acceleration was big or the objects were traveling at different speeds and one suddenly became visible. For this we need to store the previous frame’s velocity buffer. The process is:

  1. Read current velocity
  2. Use velocity to determine previous position
  3. Turn position into UV
  4. Read previous velocity
  5. Use velocity similarity metric to determine whether they belong to different surfaces

A discussion on Twitter mentions two approaches: the dot product of the two velocities and the differences in velocity magnitude. Both have problems.

  • Dot product has a discontinuity when either vector is 0 and treats opposing vectors as very different even if their magnitudes are small
  • Magnitude difference considers opposing vectors of the same magnitude as identical

The approach I propose is to use the length of the difference between the two vectors, which incidentally is the per-frame acceleration, as the similarity metric. Big accelerations mean disocclusion, and we can create a smooth ramp to take us from no disocclusion to full disocclusion. Here’s a couple of diagrams showing what I mean.

Once we have a similarity metric we can react to it. In this case we are going to lerp towards a slightly blurred version of the screen to avoid having jarring differences between the converged parts and the new ones. An alternative is to modify the convergence factor.

Alternative Hacks

Another simple way to use velocity is to weigh the contributions based on how fast an object is moving, as they are typically harder to see or are actually affected by e.g. motion blur. Good examples are chase levels or racing games.

Motion Vectors

So far we’ve improved a static scene when the camera moves, but what about when objects themselves move? We’ll compare without and with color clamping (left to right respectively).

There’s smearing like before, but now even the inner pixels are affected. Color clamping (right) does its best to fix up the colors but it’s still a jittery mess. Interestingly this can be a common effect in shipped videogames. The image below was captured in a UE4 game, where foliage lacks motion vectors.

This happens because deriving motion only from the camera is not enough, we need to take the object motion into account as well. The typical way to accomplish it is for the vertex shader to compute the position twice, once for the current and once for the previous frame. It passes those to the pixel shader, which computes the difference and outputs that to the velocity texture. For static geometry that doesn’t move or deform, we can keep using only the camera.

The resulting image works properly now. Left has no color clamping, right does.

Velocity is normally a 2-channel 16-bit floating point texture but it can vary. There are alternatives to computing the position twice, such as keeping a buffer for every vertex with the previous position in. This takes up a lot more memory, 32 bits per vertex in the simplest case, so it would only be recommended if the position computations are very expensive.

Flicker

A consequence of adding color clamping is that it may introduce flickering in static images. As a result of aliasing, high intensity subpixels can appear and disappear in alternating frames. The color neighborhood then either clamps or lets them through. Essentially the accumulation process is continuously reset and this appears as flickering. A typical way to fix this is to tonemap the image in an attempt to give less importance to the bright outliers such that the image becomes more stable. There are a few different techniques that I’ve seen.

Blend Factor Attenuation

This modifies the blend factor under certain circumstances. UE4 mentions they detect when a clamping event is going to happen and reduce the blend factor. This however reintroduces the jitter and has to be done with care.

Intensity/Color Weighing

Since the reason for flickering is high variance in consecutive neighborhoods, intensity weighing tries to attenuate pixels whose intensity is high. This stabilizes the image at the cost of specular highlights (they become dimmer, so for something like flickering sand you can boost the intensity or add it after TAA). The demo comes with luminance weighing and I’ve used log weighing in the past, but they are similar. Log weighing converts colors into log space (careful with nans!) before doing any linear operations, which biases towards low intensity values. Here’s a short comparison and pseudocode.

No Color Clamping

With Color Clamping

Luminance Weighing

Log Weighing

Blurring

A common criticism of Temporal AA is that it looks blurry. This is an issue that I never understood properly when I first started learning about the topic. We can get a crisp result on a static image, but it will blur in movement due to reconstruction errors. To see why, let’s consider the following reprojection image.

A current pixel is reprojected to the previous frame where it will most likely not land at a pixel center, instead landing somewhere between 4 samples. There is no exact value that corresponds to our position in the previous frame, so which do we take? This is a reconstruction problem. Taking any one sample produces line-like snapping artifacts. Another option is to bilinearly filter the nearest 4 samples, which is effectively a form of blurring. As there’s an accumulation buffer the error from the reconstruction adds up, causing further blurring. Another option is to take higher-order filtering. Although there are a few, the most popular is the Catmull-Rom filter, computed below as a generalized bicubic when B = 0 and C = 0.5.

This bicubic filter has negative lobes (i.e. introduces a high-pass component) that produces sharper images. Move the slider C to alter the “sharpness”. The standard Catmull-Rom is 16 texture reads that can be optimized to 9 samples by exploiting bilinear filtering. This is used by UE4. Jorge Jiménez further optimized it by discarding the corner samples down to 5 reads for Call of Duty. Here’s a comparison between bilinear and Catmull-Rom when the arm moves towards the camera.

  • One extra possibility to further increase the apparent crispness of the image is to apply a sharpening pass after TAA. Some algorithms like AMD’s FidelityFX already do this during the upscaling pass
  • An interesting but perhaps more complicated approach is presented in Hybrid Reconstruction Anti Aliasing. It estimates the error introduced by the reconstruction and tries to compensate for it

Texture Blurring

Texture blurring is another of TAA’s criticisms. Textures have already been blurred during the mipmapping process and the runtime is tuned to select the appropriate mipmap that will minimize aliasing while keeping details crisp. The jitter in screen space causes further blur in texture space. As far as I know, there are two ways to combat this directly:

  1. Introduce a negative mip bias. This will force the GPU to sample more detailed mips. Care needs to be taken to not reintroduce the aliasing we worked so hard to remove, and measure the performance impact of sampling at a higher mip now, but it can bring back detail nicely
  2. Unjitter the texture UVs. The purpose is to keep the UVs the same as when there’s no screen space jitter. I owe this knowledge to Martin Sobek who introduced me to this cool (and inexpensive!) trick. In practical terms we express the pixel jitter (increment in screen space) in terms of texture coordinates (increment in UV space) via the derivatives:

\Delta u = \Delta x · \dfrac{\partial u}{\partial x} + \Delta y · \dfrac{\partial u}{\partial y}              \Delta v = \Delta x · \dfrac{\partial v}{\partial x} + \Delta y · \dfrac{\partial v}{\partial y}

Edges

When reprojecting the current pixel, we need to realize that the velocity texture, unlike the history color buffer, is aliased. If we’re not careful we could be reintroducing edge aliasing indirectly. To better account for the edges, a typical solution is to dilate the aliased information. We’ll use velocity as an example but you can do this with depth and stencil. There are a couple of ways I know of doing it:

  1. Depth Dilation: take the velocity that corresponds to the pixel with the nearest depth in a neighborhood
  2. Magnitude Dilation: take the velocity with the largest magnitude in a neighborhood
Both techniques will essentially “inflate” the edges in order to reduce edge aliasing.

Transparency

Transparency is a tricky problem to solve, as transparent objects don’t generally render to depth. Low resolution effects such as smoke and dust are typically unaffected by Temporal AA and color clamping does its job without too many issues. However, more general transparency like glass, holograms, etc. can be affected and look poorly if not properly reprojected. There are a couple of solutions to this, and your mileage may vary because it’s content-dependent:

  1. Write blended motion vectors to the velocity buffer. This is content-dependent but it can work. In fact even writing motion vectors as if they were solid can work if the opacity of the object is sufficiently high
  2. Introduce a per-pixel accumulation factor: This is what UE4 calls “responsive AA”. Essentially it will trade off ghosting for pixel jitter. Useful for very detailed VFX as shown here
  3. Render transparency after TAA. This is not recommended unless maybe you render them into an offscreen buffer, antialias it with an edge-detection solution such as FXAA or SMAA, and composite back. It can jitter at the edges because it’s compared against a jittered depth buffer

Camera Cuts

Camera cuts present challenges when using TAA. A camera cut forces us to invalidate the history buffer, as its contents are no longer representative of the currently rendered frame. We therefore cannot rely on the history to produce a nice antialiased image. There are definitely some ways to address this that I will enumerate here.

  1. Bias the convergence to accelerate the process. After the camera cut we need to accumulate content as fast as possible
  2. Use fade outs and fade ins. The TAA will accumulate during the black parts and be converged by the time it fades in
  3. Apply another form of AA or blur for the first frames, so the convergence isn’t as jarring. This is simpler if you already have the technique available

All in all, they’re all hacks at the end of the day, but it needs addressing. The other thing to keep in mind is that the higher your framerate is, the less this is a problem.

Epilogue

I hope you enjoyed this large exposition on TAA. I’m sure I’ve left out many things, but hopefully this is a good place to start. If you have any questions or suggestions let me know, and I hope you learned something today.

Additional Bibliography

Most links are located where relevant, but here are a few extras. They are either broad or historically significant.

Bookmark the permalink.

4 Comments

  1. A quality article once again! Thanks!
    I don’t see a big difference between Luminance and Log weighing, is one actually cheaper?

    • Good question. All in all they’re both pretty adequate and serve the purpose. The reason I posted both is I came up with log independently and then learned about luminance.

      Log weighing uses trascendental functions (log) which are not full rate but there are fewer, whereas luminance weighing has more fullrate math and divisions (which aren’t fullrate either). You’d have to measure. In my experience a TAA resolve pass should generally be bandwidth bound so you might not notice a difference.

      The other thing I would do is test on real world content, for my example it may not make a difference but luminance weighing converges to a constant value when you plot it, whereas log keeps growing slowly. Depending on your content/use case one or the other could be more what you want. Check this to see what I mean https://www.desmos.com/calculator/ysha5ojn5g

  2. There is also this recent research, which uses visibility buffers and material IDs for higher quality rejection decisions:

    http://filmicworlds.com/blog/visibility-taa-and-upsampling-with-subsample-history/

    • Hi Richard,

      That is indeed a good article, thanks for posting. There are lots of avenues to pursue further for TAA, especially things like upsampling which many engines do already.

      I’m not a big fan of MSAA. I’ve measured around a 10% performance hit on geometry passes, and makes the rendering quite a lot more complicated than it is otherwise. It also to some extent prevents alternative rasterization techniques.

Leave a Reply to admin Cancel reply

Your email address will not be published.