Temporal Antialiasing Starter Pack


Credit: Jean-Philippe Grenier


Update

In many ways, this post is now a subset of a more recent and wonderfully visual tutorial on TAA by Emilio López, which you can find here: https://www.elopezr.com/temporal-aa-and-the-quest-for-the-holy-trail/ If you are looking for a walkthrough of TAA as a whole and not just implementation reference, I highly recommend starting there and using this page as an additional resource afterwards.


Overview

Recently I found myself catching up on TAA, tangentially related to my other post about trying to avoid main-pass temporal anti-aliasing. In browsing recent(ish) presentations on TAA, I'm under the impression that there aren't many approachable posts about implementing it (at least not that I was able to find) for someone learning it for the first time and maybe finds it intimidating, or potentially is confused about how to put the presentations and papers into practice. The reality is that a decent TAA implementation is within reach of anyone who is interested in learning it, without a significant time investment.

I won't be going deep into TAA theory and background myself, instead I'll be linking the works of people far more well informed on that than I am before moving onto what I believe to be the essential parts of a serviceable general-purpose TAA implementation. Nothing I'm doing here is in any way new or original, just an accumulation of various online resources in one place.


Background

Here I'll provide what I consider to be essential background reading/presentations on the type of TAA we'll be implementing below.

These are worth reading before going any further in this post:
-High Quality Temporal Supersampling by Brian Karis (2014)
-Temporal Reprojection Anti-Aliasing in INSIDE by Lasse Jon Fuglsang Pedersen (2016)
-An Excursion in Temporal Supersampling by Marco Salvi (2016)
-A Survey of Temporal Antialiasing Techniques by Lei Yang, Shiqiu Liu, and Marco Salvi (2020)

I recommend these if you're curious to go back to earlier days of TAA in the game industry and see the ideas at the time:
-TSSAA (Temporal Super-Sampling AA) by Timothy Lottes (2011). Thank you to P. A. Minerva for finding the archive.
-Anti-Aliasing Methods in CryEngine 3 by Tiago Sousa (2011). There were lots of other great AA presentations at SIGGRAPH that year, some of them also covered temporal AA.
-Graphics Gems from CryEngine 3 by Tiago Sousa (2013).

And these last two detail a lot of practical issues you run into when working with TAA in a real production environment, which I consider to be as valuable as the implementation itself.
-Temporal Supersampling and Antialiasing by Bart Wronski (2014).
-Temporal Antialiasing in Uncharted 4 by Ke Xu (2016).

See my post here for details on other AA methodologies, including links to alternative temporal antialiasing implementations like Activision's Filmic SMAA.


Velocity Buffer

First thing's first, we need to fill a motion vector target, sometimes called a velocity buffer. For starting out, you'll want to use RG16F as the format so that you have enough floating point precision and aren't fussing with encoding to some more optimized format, and clear it to 0 every frame. If you're doing a full z-prepass you can do this there, otherwise you can do it during your main pass (forward pass, gbuffer pass, etc). To fill the target, we're going to need some additional constants - wherever you're passing in your camera's view + projection matrices, make sure you also store off the matrices from the previous frame and then supply those to your shader as well. Likewise, for every object you're rendering, pass in the previous world matrix along with the current world matrix. In the long run that last step is unnecessary for static objects (I'll talk about this later on), but let's keep things simple and just do this for everything. For skinned objects, you'll either need to pass in the previous frame's bone transformation matrices, or provide a buffer of previous-frame-transformed vertex positions.

From here the logic in the vertex shader is simple:

o.currentPos = mul(float4(i.position, 1), worldMatrix);
o.currentPos = mul(o.currentPos, viewProjectionMatrix);
o.previousPos = mul(float4(i.position, 1), previousWorldMatrix);
o.previousPos = mul(o.previousPos, previousViewProjectionMatrix);

For skinned objects you'll need to take your skinning path here as well with the previous bone transforms, or alternatively read in your previously transformed positions from some storage buffer if you already have that.

The pixel shader work is also simple, but this step also proves to be a common place for simple mistakes that can ruin your resolve later, as we'll soon see.

float3 currentPosNDC = i.currentPos.xyz / i.currentPos.w;
float3 previousPosNDC = i.previousPos.xyz / i.previousPos.w;
float2 velocity = currentPosNDC.xy - previousPosNDC.xy;

I don't know many people that actually output this as NDC. I prefer it for this post because I find it easier to not mess up later steps. People familiar with TAA might be wondering where the jitter is at this point - I'm purposefully leaving it for the end. We'll first make sure that the reprojection and sampling is correct before introducing the additional complication of jittering.


Resolve

Now for the fun part! The TAA resolve is where we're going to put all the goods together that we learned from the links above (and more). After your lighting pass, but before any post processing, we're going to insert the TAA resolve pass. This means we'll be operating on the HDR targets pre-tonemapping as described in some of the links above. The inputs we need for the resolve shader are: source color (the current frame's image so far), the history color (last frame's TAA result), the motion vector target we just populated, and the depth buffer. Since the first frame has no accumulation, a decent default for frame 0 is to simply copy the source into the history, skip the resolve for a frame, or otherwise ignore the history in the resolve for the first frame. That accounted for, lets start the resolve. For simplicity's sake we'll do it with a pixel shader using a full-screen triangle.

The first piece is the neighborhood sampling loop, which is going to accomplish a number of things for us, and we'll go through each of those step by step.

float3 sourceSampleTotal = float3(0, 0, 0);
float sourceSampleWeight = 0.0;
float3 neighborhoodMin = 10000;
float3 neighborhoodMax = -10000;
float3 m1 = float3(0, 0, 0);
float3 m2 = float3(0, 0, 0);
float closestDepth = 0.0;
int2 closestDepthPixelPosition = int2(0, 0);
 
for (int x = -1; x <= 1; x++)
{
    for (int y = -1; y <= 1; y++)
    {
        int2 pixelPosition = i.position.xy + int2(x, y);
        pixelPosition = clamp(pixelPosition, 0, sourceDimensions.xy - 1);  
 
        float3 neighbor = max(0, SourceColor[pixelPosition].rgb);
        float subSampleDistance = length(float2(x, y));
        float subSampleWeight = Mitchell(subSampleDistance);
 
        sourceSampleTotal += neighbor * subSampleWeight;
        sourceSampleWeight += subSampleWeight;
 
        neighborhoodMin = min(neighborhoodMin, neighbor);
        neighborhoodMax = max(neighborhoodMax, neighbor);
 
        m1 += neighbor;
        m2 += neighbor * neighbor;
 
        float currentDepth = DepthBuffer[pixelPosition].r;
        if (currentDepth > closestDepth)
        {
            closestDepth = currentDepth;
            closestDepthPixelPosition = pixelPosition;
        }
    }
}

The first two lines are setting up the sampling position, making sure we don't exceed the texture extents. Here I'm passing in a constant, you can do that or use GetDimensions() in the source color input. Next, we actually sample the source color from this frame. The max(0, ...) here is to help make sure we don't propagate any garbage values from earlier in the frame, if you've ever seen TAA "bleed" some bad pixel until it consumes the entire image, well, this is why we do this! The purpose of the next two lines has to do with a method described in Karis's presentation, but was something I didn't quite understand from the slides until I found this tweet from Tomasz Stachowiak where he spelled it out (thank you Tomasz). Here's what he said in this and the preceeding tweet:

"Btw, one thing that helps with noise/jitter a bit is un-jittering the image inside the TAA resolve shader. Instead of point-sampling / fetching the new frame at current pixel location, do a small filter over a local neighborhood. Brian Karis details that in his TAA talk; he uses Blackman-Harris, but I found that Mitchell-Netravali yields a bit more sharpness. You effectively reconstruct the image at pixel center, treating the new frame as a set of sub-samples. This negates jitter, and stabilizes the image."

And it does indeed do exactly as he (and Karis) describe. In my personal experience I would agree here with Tomasz's preference of a Mitchell filter. If you haven't implemented filters like this before, I highly recommend checking out Matt Pettineo's github filtering project, which features a number of techniques from his The Order: 1886 SIGGRAPH talk, which is worth a read.

So now we're accumulating filtered sample information for the current frame, but we're not done! Next we need to grab the information for a neighborhood clamp and variance clip. For the former we collect neighborhoodMin and neighborhoodMax, and for the latter we collect the first and second color moments exactly as described in Salvi's presentation. We'll use these later on. Lastly, we find the sample position of the closest depth in the neighborhood, which we will use for sampling the velocity buffer. Notice the greater-than sign here, I'm using a reverse depth buffer (and you should too), if you are not, you will need to flip the sign and default value of closestDepth. There are other choices that people make to decide where is best to sample the velocity buffer for a given pixel, for example some use highest velocity, but I prefer velocity at the closest depth (the links at the beginning cover this to some extent as well). Moving on!

float2 motionVector = VelocityBuffer[closestDepthPixelPosition].xy * float2(0.5, -0.5);
float2 historyTexCoord = i.texCoord.xy - motionVector;
float3 sourceSample = sourceSampleTotal / sourceSampleWeight;
 
if(any(historyTexCoord != saturate(historyTexCoord)))
{
    return float4(sourceSample, 1);
}
 
float3 historySample = SampleTextureCatmullRom(HistoryColor, LinearSampler, historyTexCoord, float2(historyDimensions.xy)).rgb;

First we get our motion vector, which we do by sampling our velocity buffer and turning that NDC vector into a screen-space texture coordinate offset. Then we take the current texture coordinate that we got from the full-screen triangle vertex shader, and subtract that motion vector offset to arrive at the texture coordinate for the history sample. Note I am subtracting here because of the order of the subtraction done when filling the velocity buffer. I personally find this easier when thinking about what's being done - subtracting the motion to arrive at the texture coordinate for the history. If you don't like that, you can add the motion vector here and swap the subtraction from that earlier step.

Next, we calculate our new (filtered) source sample and run a simple check of the history texture coordinate to see if it's outside the bounds (0-1). If it is, we stop right here and return the filtered source sample. That's an imperfect solution, but it's a decent starting point for accounting for cases where there is no history to pull from at all.

And now finally we sample the history color from last frame, using an optimized Catmull-Rom filter courtesy (again, thanks Matt!) of Matt Pettineo, feeding it our history color texture, a linear clamp sampler, the history sample location that we calculated, and the size of the history color texture either passed in through a constant or via GetDimensions().

float oneDividedBySampleCount = 1.0 / 9.0;
float gamma = 1.0;
float3 mu = m1 * oneDividedBySampleCount;
float3 sigma = sqrt(abs((m2 * oneDividedBySampleCount) - (mu * mu)));
float3 minc = mu - gamma * sigma;
float3 maxc = mu + gamma * sigma;
 
historySample = clip_aabb(minc, maxc, clamp(historySample, neighborhoodMin, neighborhoodMax));

We've arrived at the variance clipping calculations, lifted straight from Salvi's presentation. We additionally clamp the history against the neighborhood min/max as described in the paper, and pass the variance bounds off to the clip function provided by Playdead from the linked INSIDE presentation.
 
 
float sourceWeight = 0.05;
float historyWeight = 1.0 - sourceWeight;
float3 compressedSource = sourceSample * rcp(max(max(sourceSample.r, sourceSample.g), sourceSample.b) + 1.0);
float3 compressedHistory = historySample * rcp(max(max(historySample.r, historySample.g), historySample.b) + 1.0);
float luminanceSource = Luminance(compressedSource);
float luminanceHistory = Luminance(compressedHistory); 
 
sourceWeight *= 1.0 / (1.0 + luminanceSource);
historyWeight *= 1.0 / (1.0 + luminanceHistory);
 
float3 result = (sourceSample * sourceWeight + historySample * historyWeight) / max(sourceWeight + historyWeight, 0.00001);
 
return float4(result, 1);

We've reached the end! We'll use a decent default for how much of the source sample to blend (0.05), and apply a little bit of what's known as "anti-flicker" (described in the links) to reduce the possibility of encountering high frequency details that flicker, especially due to jitter (which we'll be adding soon). The Luminance function here is just the simple dot with float3(0.2127, 0.7152, 0.0722). This won't eliminate flicker, and in fact to do this better, you likely want to be applying luminance filtering to your source and history sampling. Even then, you will still encounter flickering, and that gets into applying additional mitigations to other passes as well - specular AA, prefiltering, making things like bloom temporally aware, etc. This step at least provides an example of one such kind of mitigation that can be done for the purpose of illustration. Luminance filtering itself has plenty of imperfections, including that it does not (in this form) account for differences in perceptual lightness of different colors, but in practice this is better than not doing anything.

Alright, we've got our TAA resolve! When your camera is in motion, you should hopefully have something that is nicely anti-aliased and low on ghosting, and it should also still appear fairly sharp especially compared to simpler TAA implementations (those filtering choices are very important). But then when you let the camera sit still, the image looks super aliased! What gives? Without any motion, we're just sampling the same locations over and over and there's nothing to blend! Enter jittering.


Jitter

We fix the low/no camera motion aliasing by applying sub-pixel jittering through the projection matrix, the idea being that with enough samples you will converge and stabilize on an antialiased image. In a game this will not truly converge because of the limitations of being real-time, but it will do enough to give you something that looks good. Before going into the implementation of jittering to what we've done above, I'd like to present this spicy food for thought from someone whose perspective I very much appreciate on the subject:

"I sometimes like to get snarky about that and say that jittering is only useful for making glamour shots when the camera and world is completely still. I mean you can jitter with whatever fancy pattern you want and apply a reconstruction filter based on those offsets, but once the camera moves it's all out the window. It's not like everything in the image is perfectly translating in pixel-sized increments along X/Y, in reality it will end up that your shading points are going to "slide" all over all the geometry. This is of course why the choice of filter for sampling the previous frame texture after reprojection is so important, since it's not like the exact shading point for the current frame is going to sit nicely at a pixel center in previous frame. It will always be in-between, and so a sharper reconstruction will keep things from getting too blurry and smeary. Depending on the game there are of course times where the effective movement of parts of the screen is basically 0 even if the camera is moving, but then again if something isn't moving maybe you should just leave the sample point alone to get a stable image instead of potentially introducing flickering from your jitter pattern."

Good thoughts to keep in mind, for now we will continue with a standard addition of jittering to what we've already implemented. First up is to generate our jittering offsets, the current popular option being a Halton sequence of (2, 3) for (x, y). Here's a quick implementation from the pseudocode on the wiki page.

float Halton(uint32_t i, uint32_t b)
{
    float f = 1.0f;
    float r = 0.0f;
 
    while (i > 0)
    {
        f /= static_cast<float>(b);
        r = r + f * static_cast<float>(i % b);
        i = static_cast<uint32_t>(floorf(static_cast<float>(i) / static_cast<float>(b)));
    }
 
    return r;
}

And now we'll make use of it to generate the jitter for each frame.

float haltonX = 2.0f * Halton(jitterIndex + 1, 2) - 1.0f;
float haltonY = 2.0f * Halton(jitterIndex + 1, 3) - 1.0f;
float jitterX = (haltonX / dimensions.x);
float jitterY = (haltonY / dimensions.y);

Here the variable jitterIndex increases every frame up to the desired sample count, and dimensions is your render target dimensions. A good default to start from is 8 (so jitterIndex++; jitterIndex = jitterIndex % 8;) but it's worth playing around with it to see what works for your application. Note the "+ 1" input to the Halton function in order to avoid the first index returning 0. To apply it, you can either add this jitterX/Y to the projection matrix's [2][0] and [2][1] or [0][2] and [1][2] (depending on row vs column major), or more clearly you can construct a translation matrix with the jitter and then multiply it with your projection matrix. You'll also want to store this jitter in a constant, and just like we did with the view and projection matrices, we'll track the previous frame jitter and pass it as a constant as well. We need these for the next step where we will modify the velocity buffer generation.

float2 velocity = (currentPosNDC.xy - jitter) - (previousPosNDC.xy - previousJitter);

This step is deceptively important when working with a static world/camera, because if we don't remove the jitter, we'll be sampling outside of our intended reconstruction area which will create a blurry result that we don't want. Put more simply, we want the motion vectors to be zero when there is no motion! That way, the jittered projection will be working as intended. Less obviously important, but likely still beneficial is to also incorporate the current frame jitter into subSampleDistance during the source color sample filtering.

There are lots of other options that people use for jittering. One that Scott Lembcke recommended is Martin Robert's R2 sequence

With this taken care of, you should have a perfectly serviceable TAA implementation for general purpose usage.


Sweeteners

Beyond TAA components like the ones we implemented here, there are plenty of other tricks people use to improve their TAA implementations. Most of those I tend to categorize under the category of "sweeteners," that is, these are features that are more than likely geared to the engine you're working with, the type of content in your game, or a combination of both. I think it can be important to distinguish these from things like reprojection/clipping/clamping/etc because oftentimes other features don't necessarily translate well from one context to another, and so it can be confusing when people try them out and they don't work as expected.

When you start to get into stuff like depth based and velocity based sample acceptance/rejection, or stencil masking your TAA, you're likely getting into more game-specific stuff. I think using YCoCg for clipping probably falls under this category as well. At least the way it's traditionally implemented, from what I've seen it can work a little better in some situations but not so well in others. My limited exploration with perceptual lightness has made me somewhat skeptical of dropping in YCoCg and calling it a day. Likely a lot more tweaking would be needed in practice, and even then I get the feeling there are other color spaces you could be using. I do absolutely agree with the thought process that led to YCoCg in TAA though, I think there is a lot of potential to expand upon it. I plan on doing that too, and I'll share my findings on that whether it's successful or not. Don't take my word for it though, try it out for yourself and see what you think.

From Philip Hammer: "Handling disocclusions as in Uncharted 4 (masking objects and compare curr/prev masks) really helps with ghosting for 3rd person characters or 1st person weapons."

From Kyle Hayward: "This [the above comment] and frame counting history greatly reduces ghosting."

From Alan Wolfe: "Jorge Jimenez's under appreciated "interleaved gradient noise" is a great choice for per pixel random numbers when rendering under TAA. http://www.iryoku.com/next-generation-post-processing-in-call-of-duty-advanced-warfare" See the tweet for more details, as well as Alan's own resource (thanks Alan!): https://blog.demofox.org/2017/10/31/animating-noise-for-integration-over-time/


Optimization

My implementation above is unoptimized. Given the sampling involved, there is a lot of potential to improve performance by converting it to a compute shader and making use of groupshared memory. Another important optimization to evolve your implementation is to not export velocity/motion for static objects in your pass that fills these out. For static objects, the motion vector (outside of camera motion) will be 0, so an obvious improvement is to not write them out for static objects and instead run a compute shader after that pass to apply the camera motion by reprojecting from the values in the depth buffer, and write that out to the velocity buffer.


Challenges

The initial difficulty of working with TAA is getting it up and running properly the first time. Tiny mistakes can easily ruin your results, a missed multiplication or incorrect conversion can mean the difference between tons of ghosting and a relatively clean image. Worse, tiny mistakes can lead to mostly subtle issues that go unnoticed in most cases, and these can be difficult to track down. This is why my approach above focuses on a lot of the basics, and taking them piece by piece. Start simple, then slowly add features, test, verify, test again.

Once you're past the initial implementation, the rest of your time spent with TAA is likely to be dealing with not-so-edge-cases like transparency, FX like particles, or anything with UV scrolling, for example. Check the two examples I linked towards the top about challenges that you would run into in a production environment. No TAA implementation is immune to these issues, every implementation requires care and feeding throughout the life of a project. This is maybe why we get so attached to particular implementations, because much like a pet we spend years nurturing them :-)

"Fun" example from Don Williamson: "I had one client that placed objects in a scenegraph hierarchy, parenting to a root. They'd then unparent to give to gfx but used the initial matrix and the unparented one at two parts of the gfx pipe, resulting in a subtle reproj drift. Took a while to track that!"


A Plea For Accessibility

I'll keep this short(ish), but this is something I harp on to the point of exhaustion: please give some thought to the accessibility factor of TAA if you're shipping a game with it. I fully realize that nowadays most renderers rely on TAA to achieve good enough performance and quality for a number of modern techniques, but on the flip-side this reliance on TAA as being baked into the render pipeline leaves behind people who are sensitive to its downsides, like motion sickness from ghosting and jittering. If there is a game that forces TAA, I guarantee you that the gaming community has made a mod that can disable it, because for some non-zero percentage of players it means the difference between being able to play or not. We can at least do the minimal effort of offering a checkbox to disable TAA, jittering, or both in a menu with a warning about image quality, or a small step further by also offering a slider for controlling the strength of the clipping to allow players to opt into more flickering and aliasing in exchange for less ghosting. I would like to see more robust solutions than that personally, but I recognize that there is only so much that can be done on a production timeline. Something will always be better than nothing though, let's do more than nothing.


Other TAA Implementations

Here's a bunch of links to TAA implementations I've seen scattered across github. Thanks to Mikkel Gjoel for having a list of links in one place, which I know a number of people have found useful. I'll list those here as well as others, if you have a TAA implementation that you'd like me to add to the list, send me a message!

https://github.com/NVIDIAGameWorks/Falcor/blob/master/Source/RenderPasses/Antialiasing/TAA/TAA.ps.slang
https://github.com/playdeadgames/temporal
https://github.com/h3r2tic/rtoy-samples/blob/master/assets/shaders/taa.glsl
https://github.com/Unity-Technologies/Graphics/blob/master/com.unity.render-pipelines.high-definition/Runtime/PostProcessing/Shaders/TemporalAntialiasing.hlsl
https://github.com/TheRealMJP/MSAAFilter/blob/master/MSAAFilter/Resolve.hlsl
https://github.com/turanszkij/WickedEngine/blob/master/WickedEngine/shaders/temporalaaCS.hlsl
https://gist.github.com/Erkaman/f24ef6bd7499be363e6c99d116d8734d
https://github.com/GameTechDev/TAA/blob/main/MiniEngine/Core/Shaders/TAAResolve.hlsl
https://github.com/PanosK92/SpartanEngine/blob/master/Data/shaders/temporal_antialiasing.hlsl
https://github.com/NVIDIA/Q2RTX/blob/master/src/refresh/vkpt/shader/asvgf_taau.comp
https://ziyadbarakat.wordpress.com/2020/07/28/temporal-anti-aliasing-step-by-step/



Contact