How to Compact Acceleration Structures in D3D12





Overview

One of the most important features of acceleration structure management in D3D12 is compaction. Compaction uses the information built by the driver during acceleration structure creation to optimize the data to be much smaller, often to 45-50% of the original size! This post details how to do it. I will be focusing on implementing this for Bottom Level Acceleration Structures (BLAS) as that is the most useful purpose of this feature, but it applies to TLASes as well. I also will assume knowledge of how to create acceleration structures with D3D12.

Documentation, DirectX Raytracing (DXR) spec, and samples can be found here, if you're looking for a good place to start: https://github.com/microsoft/DirectX-Graphics-Samples/tree/master/Samples/Desktop/D3D12Raytracing



Modify Acceleration Structure Creation

Where you are currently creating your acceleration structure that you want to compact, you will need to add the D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_ALLOW_COMPACTION flag. Doing this will potentially increase the size of the pre-compacted acceleration structure, presumably so the driver can create additional necessary information, but that's okay because we're going to compact it anyway.

D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS blasInputs = {};
blasInputs.Type = D3D12_RAYTRACING_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL;
blasInputs.DescsLayout = D3D12_ELEMENTS_LAYOUT_ARRAY;
blasInputs.pGeometryDescs = geometryDescs;
blasInputs.NumDescs = numGeometryDescs;
blasInputs.Flags = D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_PREFER_FAST_TRACE |
                   D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BUILD_FLAG_ALLOW_COMPACTION; //new flag

D3D12_RAYTRACING_ACCELERATION_STRUCTURE_PREBUILD_INFO prebuildInfo = {};
mDevice->GetRaytracingAccelerationStructurePrebuildInfo(&blasInputs, &prebuildInfo);
...
D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_DESC buildDesc = {};
buildDesc.Inputs = blasInputs;
buildDesc.ScratchAccelerationStructureData = scratchBufferAddress;
buildDesc.DestAccelerationStructureData = blasAddress;

mCommandList->BuildRaytracingAccelerationStructure(&buildDesc, 0, nullptr);



Query the Compacted Size

Now that we're building with _ALLOW_COMPACTION, we can query for the compacted size. Unlike other operations though, this calculation happens on the GPU and writes to a GPU virtual address. There are two ways to go about doing this calculation (which I will cover), but both require filling out a D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_DESC, and here we can see the DestBuffer is a D3D12_GPU_VIRTUAL_ADDRESS.

So we need a buffer to readback that data, and while there are a few ways to go about that, a resource allocated from a heap of type D3D12_HEAP_TYPE_READBACK seems most fitting, as it is optimized for exactly this purpose. If you follow this path though, you will be met with an error from the validation layer, even though the operation may still work on your specific device:

D3D12 ERROR: Invalid heap type for resource. Cannot use readback resources as a acceleration structure post build info.

It turns out that D3D12_HEAP_TYPE_READBACK was made with very specific purposes in mind, and this was not one of them. This isn't a dead end though, we just need to create a heap with the same properties as D3D12_HEAP_TYPE_READBACK and create the resource out of that heap. This can be done with the ID3D12Device function GetCustomHeapProperties:

D3D12_HEAP_PROPERTIES heapProperties = mDevice->GetCustomHeapProperties(0, D3D12_HEAP_TYPE_READBACK);

You can then pass this to CreateHeap, CreateCommittedResource, or however you allocate your resources. If you use D3D12MemAlloc, you can create a custom pool using these heap properties by calling CreatePool on your D3D12MA::Allocator, and then use the CustomPool member of the D3D12MA::ALLOCATION_DESC when you create the resource.

When allocating the resource, you will want it to be large enough to store a 64-bit value, or more specifically a D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_COMPACTED_SIZE_DESC. You will also need to map the resource via ID3D12Resource::Map and save that mapped address for reading later. I choose to persistently map the resource and reuse this buffer.

With the proper destination buffer for the readback operation, we can now generate the compaction size query description using the virtual address of that buffer:

D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_DESC postBuildInfo;
postBuildInfo.InfoType = D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_COMPACTED_SIZE;
postBuildInfo.DestBuffer = readbackBufferVirtualAddress;

Now we can queue up that calculation on the command list. As previously mentioned there are two ways of doing this. The first is to pass this postBuildInfo as additional arguments to the BLAS build itself:

mCommandList->BuildRaytracingAccelerationStructure(&buildDesc, 1, &postBuildInfo);

The second is to call EmitRaytracingAccelerationStructurePostbuildInfo after the BLAS is built. Note that this will require a UAV barrier in between to ensure that the BLAS is built before attempting to calculate its compacted size. This may also be more convenient if your plan is to build a number of BLASes per frame and then query their compacted sizes all at once with a single call to this function. Both methods should be just fine.

mCommandList->EmitRaytracingAccelerationStructurePostbuildInfo(&postBuildInfo, 1, &blasAddress);



Read Back the Compacted Size

Regardless of how the compaction size is queried, we now need to wait for that work to complete via a fence. I simply wait for the frame buffering to guarantee its completion (eg 2 buffered frames means I wait until frame N + 2), but that's just one way to do it. Once the appropriate wait is complete, we can safely read the compacted size back:

D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_COMPACTED_SIZE_DESC compactedSizeDesc = {};
memcpy_s(&compactedSizeDesc, sizeof(D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_COMPACTED_SIZE_DESC), 
         mappedResourceAddress, sizeof(D3D12_RAYTRACING_ACCELERATION_STRUCTURE_POSTBUILD_INFO_COMPACTED_SIZE_DESC));

And now we finally have it, the value stored in compactedSizeDesc.CompactedSizeInBytes is the buffer size we need for the compacted acceleration structure.



Create the Compacted BLAS

With the compacted size in hand, we simply need to create a new buffer for the acceleration structure like any other, but using the compacted size. The last step is convenient and straightforward, by calling CopyRaytracingAccelerationStructure with the addresses of our original BLAS, and our compacted BLAS:

mCommandList->CopyRaytracingAccelerationStructure(compactedBlasAddress, sourceBlasAddress, D3D12_RAYTRACING_ACCELERATION_STRUCTURE_COPY_MODE_COMPACT);

The original BLAS instance can now be replaced in the TLAS with the newly compacted BLAS, with a UAV barrier added between the call to CopyRaytracingAccelerationStructure and building the TLAS. The original BLAS can then be deleted, as long as you have confirmed it is no longer in flight on the GPU (accounting for frame buffering).



Additional Considerations

If you dig into the Nvidia resource on compaction and acceleration structure memory management (see bottom of this post), the most important additional note related to compaction is the method used to allocate resources. Acceleration structure alignment is D3D12_RAYTRACING_ACCELERATION_STRUCTURE_BYTE_ALIGNMENT (256), but buffer resources must be allocated with D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT, which is 64KB. That means if resources are allocated naively there is potential for a significant amount of waste, and in the compaction case with small or medium sized meshes, the compaction step might not save much or any memory at all due to alignment.

The strategy to implement here is out of scope for this post, but here is what I recommend:

1) When allocating memory for large pools, start with pages of GPU memory where each page is an ID3D12Heap. 64MB per page is a decent place to start, as it isn't too difficult to run into fragmentation issues on the GPU with limited VRAM.

2) Create a placed ID3D12Resource that encompasses the heap, for each heap: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createplacedresource

3) Use an allocator to suballocate resources out of that ID3D12Resource (offsets into that resource). Deriving a TLSF allocator is my personal preference: http://www.gii.upv.es/tlsf/index.html

Doing this will allow you to avoid the placed resource alignment requirement and take full advantage of compaction. If you use D3D12MA, it turns out that it also generously provides a virtual TLSF allocator via CreateVirtualBlock()!

Another note worth considering is that acceleration structures that use refits (for example with skinning) can also be compacted and cloned to save memory, which could be useful in cases where you know something will stop needing rebuilds.

This is the helpful Nvidia article about Acceleration Structure Compaction: https://developer.nvidia.com/blog/tips-acceleration-structure-compaction/

Here is the D3D12MA GitHub, if you are looking for the leading open source D3D12 memory allocator (Thank you, Adam Sawicki and AMD!) https://github.com/GPUOpen-LibrariesAndSDKs/D3D12MemoryAllocator


Contact