Long gone are the times where Temporal AA was a novel technique, and slowly more articles appear covering motivations, implementations and solutions. I will throw my programming hat into the ring to walk through it, almost like a tutorial, for the future me and for anyone interested. I am using Matt Pettineo’s MSAAFilter demo to show the different stages. The contents come mostly from the invaluable work of many talented developers, and a little from my own experience. I will introduce a couple of tricks I have come across that I haven’t seen in papers or presentations.
Sources of aliasing
The origin of aliasing in CG images varies wildly. Geometric (edge) aliasing, alpha testing, specular highlights, high frequency normals, parallax mapping, low resolution effects (SSAO, SSR), dithering and noise all conspire to destroy our visuals. Some solutions, like hardware MSAA and screen space edge detection techniques, work for a subset of cases but fail in different ways. Temporal techniques attempt to achieve supersampling by distributing the computations across multiple frames, while addressing all forms of aliasing. This stabilizes the image but also creates some challenging artifacts.
Jitter
The main principle of TAA is to compute multiple sub-pixel samples across frames, then combine those together into a single final pixel. The simplest scheme generates random samples within the pixel, but there are better ways of producing fixed sequences of samples. A short overview of quasi-random sequences can be found here. It is important to select a good sequence to avoid clumping, and a discrete number of samples within the sequence: typically between 4-8 work well. In practice this is more important for a static image than a dynamic one. Below a pixel with 4 samples.
To produce random sub-samples within a pixel we translate the projection matrix by a fraction of a pixel along the frustum plane. The valid range for the jitter offset (relative to the pixel center) is half the inverse of the screen dimension in pixels, so \begin{bmatrix}\dfrac{-1}{2w},\dfrac{1}{2w}\end{bmatrix} and \begin{bmatrix}\dfrac{-1}{2h},\dfrac{1}{2h}\end{bmatrix}. We multiply the offset matrix (just a normal translation matrix) by the projection matrix to get the modified projection, as shown below.
\begin{pmatrix} \dfrac{2n}{w} & 0 & 0 & 0\\ 0 & \dfrac{2n}{h} & 0 & 0\\ 0 & 0 & \dfrac{f}{f-n} & 1\\ 0 & 0 & \dfrac{-f·n}{f-n} & 0\\ \end{pmatrix} · \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ j_x & j_y & 0 & 1\\ \end{pmatrix}= \begin{pmatrix} \dfrac{2n}{w} & 0 & 0 & 0\\ 0 & \dfrac{2n}{h} & 0 & 0\\ j_x & j_y & \dfrac{f}{f-n} & 1\\ 0 & 0 & \dfrac{-f·n}{f-n} & 0\\ \end{pmatrix}
Once we have a set of samples, we use this matrix to rasterize geometry as normal to produce the image that corresponds to the sample. If it all works well and every frame you get a new jitter, the image should look wobbly like this.