Alex Tardif: Graphics Engineer

Using D3D12 in 2022

Introduction

This guide is meant to jump-start practical usage of DirectX 12. Modern graphics APIs like DirectX 12 can be intimidating to learn at first, and there are few resources that make use of relevant evolutions from the last few years. Although this is not a deep-dive tutorial of the D3D12 API boilerplate (there are plenty of those already), my goal is to make the API more approachable by exposing you to the D3D12 ecosystem and showing you by example how you can use the API effectively. If you are looking for thorough API details, a good place to start would be the DX12 programming guide https://docs.microsoft.com/en-us/windows/win32/direct3d12/directx-12-programming-guide, the specification https://microsoft.github.io/DirectX-Specs/, and the DirectX Graphics Samples github: https://github.com/microsoft/DirectX-Graphics-Samples.

If you found D3D12 boilerplate intimidating, or if you bounced off D3D12 because you weren't sure what to do with it after your Hello Triangle, or if you want to see an example of recent improvements to the API, this post may be for you. The only way I could keep this post manageable enough to share was to create a helper API wrapper, which I designed for tinkerers to pick apart at their own pace, and I include explanations of that decision-making throughout this post.

Getting Started

Download and extract the D3D12Lite zip file. Before we get into project setup, let's look at what is included:

SimpleMath: a wrapper around DirectXMath that has the basics we'll need. https://github.com/Microsoft/DirectXTK/wiki/SimpleMath
DirectX Shader Compiler (DXC): the shader compiler we'll be using. https://github.com/microsoft/DirectXShaderCompiler
DirectXTex: a library we can use for reading DDS format texture files. https://github.com/microsoft/DirectXTex
D3D12 Memory Allocator: an easy to use memory allocator for D3D12. https://github.com/GPUOpen-LibrariesAndSDKs/D3D12MemoryAllocator
Dear ImGui: the industry's best open source UI library. https://github.com/ocornut/imgui
D3D12Lite: A pair of files written by me that wrap a subset D3D12, providing a simpler interface that still takes advantage of key D3D12 features and demonstrates some helpful practices for using the D3D12 API.

Also download and extract the Tutorial zip file. Included are shaders and a texture used for the tutorial, as well as a reference main.cpp in case anything is unclear.

Project Requirements and Setup

The most straightforward thing to do is check if you are DirectX Ultimate and Agility SDK compatible. For DirectX Ultimate, check the Xbox Game Bar settings to see if your environment supports it.

For the Agility SDK, check out the "Setting up your machine" section of this link: https://devblogs.microsoft.com/directx/gettingstarted-dx12agility/
If you support both of these and have a somewhat recent graphics driver, you should be good to go. If not, this post may unfortunately not be for you yet, though there might be plenty of things you can take from D3D12Lite. I want this tutorial to more closely reflect how the API is used today rather than a few years ago or more.

Next, make sure you have Visual Studio 2019 (or newer, 2022 should be fine) installed, and that you have a recent version of the Windows SDK installed: https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/

With that, we're ready to move on to project configuration.

-Open Visual Studio 2019 and select "Create a new project" and choose "Empty Project". Name the project "DX12Tutorial" or whatever you prefer, then hit "Create".

-Copy the extracted D3D12Lite files to the project folder.

-Click and drag those highlighted folders and files into the project in Visual Studio.

-At the top, set the solution platform to x64 rather than x86, if it isn't already.

-Install the DirectX Agility SDK package. To do this, right click the project, select "Manage NuGet Packages", click "Browse" and search for "DirectX 12 Agility SDK", and install it.

-In D3D12Lite.cpp, change the external D3D12SDKVersion to the version that matches the Agility SDK you just installed. If you don't do this, you will get an error message of something like "D3D12SDKVersion from D3D12Core != requested D3D12SDKVersion".

-In the project configuration properties (right click the project and click "Properties"), set "C++ Language Standard" to at least C++ 17. Don't worry too much if you want to target something else, you can easily replace any standard library stuff I have in here later.

-Continuing in the project configuration properties, go to Linker -> All Options. Add d3d12.lib, dxgi.lib, and dxcompiler.lib to "Additional Dependencies". Next, add the path to where you pasted dxc\lib\x64 (part of the files you got from the zip file) to "Additional Library Directories".

-Lastly in the Solution Explorer window, with the project highlighted, click the "Show All Files" button at the top to change the view(you can go back after). Find dxil.dll and dxcompiler.dll in this view (dxc/bin/x64) and for both, right-click and choose "Properties". Select "Included in project" and set to "True". This will enable a different properties menu. For each of the two files, right click, and again select "Properties" which will open the separate menu. Set the "Item Type" setting to "Copy File". This will auto-copy dxil.dll and dxcompiler.dll to the output directory, where they are needed. In the event that this step doesn't age well, just manually copy those two dll files next to your executable once you have something building, or set up some xcopy commands to copy the files on build events, or however you'd like to automate that.

Create a main.cpp file, and let's write some code!

Creating a Window

We'll start with a bit of code to set up a window and message loop. There are tons of resources out there on this so I won't linger on it, here's a link to the docs! https://docs.microsoft.com/en-us/windows/win32/learnwin32/creating-a-window

#include "D3D12Lite.h"
 
using namespace D3D12Lite;
 
LRESULT CALLBACK WndProc(HWND hwnd, UINT umessage, WPARAM wparam, LPARAM lparam)
{
    switch (umessage)
    {
    case WM_KEYDOWN:
        if (wparam == VK_ESCAPE)
        {
            PostQuitMessage(0);
            return 0;
        }
        else
        {
            return DefWindowProc(hwnd, umessage, wparam, lparam);
        }
 
    case WM_DESTROY:
        [[fallthrough]];
    case WM_CLOSE:
        PostQuitMessage(0);
        return 0;
 
    default:
        return DefWindowProc(hwnd, umessage, wparam, lparam);
    }
}
 
int main()
{
    std::wstring applicationName = L"D3D12 Tutorial";
    Uint2 windowSize = { 1920, 1080 };
    HINSTANCE moduleHandle = GetModuleHandle(nullptr);
 
    WNDCLASSEX wc = { 0 };
    wc.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
    wc.lpfnWndProc = WndProc;
    wc.hInstance = moduleHandle;
    wc.hIcon = LoadIcon(nullptr, IDI_WINLOGO);
    wc.hIconSm = wc.hIcon;
    wc.hCursor = LoadCursor(nullptr, IDC_ARROW);
    wc.hbrBackground = (HBRUSH)GetStockObject(BLACK_BRUSH);
    wc.lpszMenuName = nullptr;
    wc.lpszClassName = applicationName.c_str();
    wc.cbSize = sizeof(WNDCLASSEX);
    RegisterClassEx(&wc);
 
    HWND windowHandle = CreateWindowEx(WS_EX_APPWINDOW, applicationName.c_str(), applicationName.c_str(),
        WS_CLIPSIBLINGS | WS_CLIPCHILDREN | WS_OVERLAPPED | WS_SIZEBOX,
        (GetSystemMetrics(SM_CXSCREEN) - windowSize.x) / 2, (GetSystemMetrics(SM_CYSCREEN) - windowSize.y) / 2, windowSize.x, windowSize.y,
        nullptr, nullptr, moduleHandle, nullptr);
 
    ShowWindow(windowHandle, SW_SHOW);
    SetForegroundWindow(windowHandle);
    SetFocus(windowHandle);
    ShowCursor(true);
 
    bool shouldExit = false;
    while (!shouldExit)
    {
        MSG msg{ 0 };
        if (PeekMessage(&msg, nullptr, 0, 0, PM_REMOVE))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
 
        if (msg.message == WM_QUIT)
        {
            shouldExit = true;
        }
    }
 
    DestroyWindow(windowHandle);
    windowHandle = nullptr;
 
    UnregisterClass(applicationName.c_str(), moduleHandle);
    moduleHandle = nullptr;
 
    return 0;
}

Compile and run, and you should see a window pop up that closes when you press escape.

Rendering a Colored Background

Next let's create our D3D12 device and fill this window with some color. Start by creating a class above our main() called "Renderer" with the following two members:

std::unique_ptr<Device> mDevice;
std::unique_ptr<GraphicsContext> mGraphicsContext;

And a constructor:

Renderer(HWND windowHandle, Uint2 screenSize)
{
    mDevice = std::make_unique<Device>(windowHandle, screenSize);
    mGraphicsContext = mDevice->CreateGraphicsContext();
}

The Device class is the API through which we will create and destroy graphics resources, and submit work to the GPU. To submit graphics work to D3D12, we use command lists (ID3D12CommandList), which is as it sounds: a list of commands that you build up in an internal buffer and then send to the device to execute. There are some helpful ways to wrap these command lists to simplify workflow, which is exactly what GraphicsContext is here. In D3D12, you submit command lists to different types of device queues (ID3D12CommandQueue) to consume: graphics, compute, and copy. Generally speaking, you submit main-path graphics and compute work to the graphics queue, async-compute work to the compute queue, and upload work to the copy queue. Don't worry too much if you don't understand this yet, D3D12Lite manages this by providing different "Context" classes that are submitted to the right queues under the hood for you.

Back to the action, let's create two member functions for Renderer, "Render" and "RenderClearColorTutorial" :

void RenderClearColorTutorial()
{
    mDevice->BeginFrame();
 
    TextureResource& backBuffer = mDevice->GetCurrentBackBuffer();
 
    mGraphicsContext->Reset();
 
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_RENDER_TARGET);
    mGraphicsContext->FlushBarriers();
 
    mGraphicsContext->ClearRenderTarget(backBuffer, Color(0.3f, 0.3f, 0.8f));
 
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_PRESENT);
    mGraphicsContext->FlushBarriers();
 
    mDevice->SubmitContextWork(*mGraphicsContext);
 
    mDevice->EndFrame();
    mDevice->Present();
}
 
void Render()
{
    RenderClearColorTutorial();
}

To effectively make use of frame-buffering, which means safely generating and submitting commands to the GPU for the current frame while it is still processing the previous frame, D3D12Lite does some under-the-hood bookkeeping via BeginFrame() and EndFrame(). This isn't required, but it is good practice as this is often critical to achieving good performance in more complex scenarios. By default D3D12Lite uses 2 buffered frames.

To clear the frame, we begin by getting the current frame's backbuffer (the virtual surface we are rendering to) from the device. We then Reset the graphics context, which sets up the internal command list for one submission worth of work per frame. Next, we issue a barrier to transition the backbuffer from its current state to the "render target" state. Resource states are D3D12's way of keeping track of how a resource is being used, and barriers are our way of telling the device that we are changing the way we are using it - in this case as a rendering target. Barriers are more performant if we collect them together whenever possible before submitting. GraphicsContext gathers them for us until we call FlushBarriers to submit them into the command list. In this case, we only have one.

Next we tell the context to clear the backbuffer with our arbitrarily chosen purple color, our goal for this section! Then we transition the backbuffer to the "present" state, so that these results can be presented to the screen. It's important to note here that all we've done so far is build up an in-order list of commands for the GPU to consume, but we haven't actually told the GPU to do so. We do this by calling SubmitContextWork on the Device. Submitting command lists to D3D12 is not a cheap operation, so we want to batch up as much work as we can with the least amount of submissions as we can. At this point mGraphicsContext becomes unusable until the next frame, as it only has an internal allotment of one command allocator (ID3D12CommandAllocator, the memory that the command list writes into) per buffered frame. If we wanted to submit multiple times in a single frame, we would create multiple GraphicsContexts. It doesn't have to be this way, but it reinforces a healthy pattern of command management: use the same command allocators for the same workloads every frame, as much as possible. This optimizes your command memory usage by keeping your worst cases within the same command allocators, rather than spreading the worst case memory usage to all of them.

Finally, once we have called EndFrame, we call Present() to queue up the presentation of this frame of work after our context work is processed. To see this all in action, we'll make some slight modifications to the main function from earlier.

[...]
std::unique_ptr<Renderer> renderer = std::make_unique<Renderer>(windowHandle, windowSize);
 
bool shouldExit = false;
while (!shouldExit)
{
    MSG msg{ 0 };
    if (PeekMessage(&msg, nullptr, 0, 0, PM_REMOVE))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
 
    if (msg.message == WM_QUIT)
    {
        shouldExit = true;
    }
 
    renderer->Render();
}
 
renderer = nullptr;
[...]

Compile and run, and you should now see the window filled with the color we chose. You are now a graphics programmer!

We have just one last thing to clean up. If you close the app, you will likely see an error message from the D3D12 debug validation layer. That's because we allowed the GraphicsContext to be destroyed while its commands were still being executed by the device - not good! Let's fix this with a few lines of code in Renderer's destructor. Here, WaitForIdle ensures that all work on the GPU is complete before continuing.

mDevice->WaitForIdle();
mDevice->DestroyContext(std::move(mGraphicsContext));
mDevice = nullptr;

Rendering a Triangle

We have no intention of stopping with a cleared screen! It's time for our first triangle. We're going to need some shaders for this, so let's create a new folder next to the other folders we copied to the project and name this "Shaders", with a subfolder named "Compiled".

These are the paths that D3D12Lite will read shader source and write compiled output as DXIL, you can change these to whatever you like by editing D3D12Lite. In the "Shaders" folder, paste the shader files from the provided tutorial zip file. Have a look through the code and comments in Common.hlsl, Triangle.hlsl, and Shared.h. Of particular note are two things:

1) D3D12Lite does not make use of shader reflection. Instead we use Shared.h for anything we want shared between HLSL and D3D12. Reflection is outside of the scope of this tutorial, but of course can be added if you want that.

2) D3D12Lite has a very opinionated binding model, and equates an HLSL "space" with a D3D12 descriptor table. The binding class for any non-bindless resource is PipelineResourceSpace, which is a structure that acts as both a part of a root signature and binder in one. Bindless resources are accessed via HLSL's ResourceDescriptorHeap.

Don't worry too much about this yet, but the above will be a useful reference soon. Make sure to now include Shared.h in the Visual Studio project, as we'll need it for a few things. Now, let's add some new member variables to Renderer for this section:

std::unique_ptr<BufferResource> mTriangleVertexBuffer;
std::unique_ptr<BufferResource> mTriangleConstantBuffer;
std::unique_ptr<Shader> mTriangleVertexShader;
std::unique_ptr<Shader> mTrianglePixelShader;
std::unique_ptr<PipelineStateObject> mTrianglePSO;
PipelineResourceSpace mTrianglePerObjectSpace;

Here we have a vertex buffer for our triangle vertices, and a constant buffer for the constants we need in order to draw it. Below that are our vertex and pixel shaders for drawing the triangle, and the pipeline state object that they'll be built into. Finally, we have the PipelineResourceSpace, which as mentioned earlier, will help us generate the root signature (contained by the PSO) as well as actually binding our non-bindless shader resources, in this case simply the constant buffer. Now let's initialize all of these in a new function that we'll call from the end of Renderer's constructor:

void InitializeTriangleResources()
{
    std::array<TriangleVertex, 3> vertices;
    vertices[0].position = { -0.5f, -0.5f };
    vertices[0].color = { 1.0f, 0.0f, 0.0f };
    vertices[1].position = { 0.0f, 0.5f };
    vertices[1].color = { 0.0f, 1.0f, 0.0f };
    vertices[2].position = { 0.5f, -0.5f };
    vertices[2].color = { 0.0f, 0.0f, 1.0f };
 
    BufferCreationDesc triangleBufferDesc{};
    triangleBufferDesc.mSize = sizeof(vertices);
    triangleBufferDesc.mAccessFlags = BufferAccessFlags::hostWritable;
    triangleBufferDesc.mViewFlags = BufferViewFlags::srv;
    triangleBufferDesc.mStride = sizeof(TriangleVertex);
    triangleBufferDesc.mIsRawAccess = true;
 
    mTriangleVertexBuffer = mDevice->CreateBuffer(triangleBufferDesc);
    mTriangleVertexBuffer->SetMappedData(&vertices, sizeof(vertices));
 
    BufferCreationDesc triangleConstantDesc{};
    triangleConstantDesc.mSize = sizeof(TriangleConstants);
    triangleConstantDesc.mAccessFlags = BufferAccessFlags::hostWritable;
    triangleConstantDesc.mViewFlags = BufferViewFlags::cbv;
 
    TriangleConstants triangleConstants;
    triangleConstants.vertexBufferIndex = mTriangleVertexBuffer->mDescriptorHeapIndex;
 
    mTriangleConstantBuffer = mDevice->CreateBuffer(triangleConstantDesc);
    mTriangleConstantBuffer->SetMappedData(&triangleConstants, sizeof(TriangleConstants));
 
    ShaderCreationDesc triangleShaderVSDesc;
    triangleShaderVSDesc.mShaderName = L"Triangle.hlsl";
    triangleShaderVSDesc.mEntryPoint = L"VertexShader";
    triangleShaderVSDesc.mType = ShaderType::vertex;
 
    ShaderCreationDesc triangleShaderPSDesc;
    triangleShaderPSDesc.mShaderName = L"Triangle.hlsl";
    triangleShaderPSDesc.mEntryPoint = L"PixelShader";
    triangleShaderPSDesc.mType = ShaderType::pixel;
 
    mTriangleVertexShader = mDevice->CreateShader(triangleShaderVSDesc);
    mTrianglePixelShader = mDevice->CreateShader(triangleShaderPSDesc);
 
    mTrianglePerObjectSpace.SetCBV(mTriangleConstantBuffer.get());
    mTrianglePerObjectSpace.Lock();
 
    PipelineResourceLayout resourceLayout;
    resourceLayout.mSpaces[PER_OBJECT_SPACE] = &mTrianglePerObjectSpace;
 
    GraphicsPipelineDesc trianglePipelineDesc = GetDefaultGraphicsPipelineDesc();
    trianglePipelineDesc.mVertexShader = mTriangleVertexShader.get();
    trianglePipelineDesc.mPixelShader = mTrianglePixelShader.get();
    trianglePipelineDesc.mRenderTargetDesc.mNumRenderTargets = 1;
    trianglePipelineDesc.mRenderTargetDesc.mRenderTargetFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
 
    mTrianglePSO = mDevice->CreateGraphicsPipeline(trianglePipelineDesc, resourceLayout);
}

We start by setting up the three vertices of our triangle, sourcing the vertex structure from Shared.h.

Next, we create a buffer description for the triangle vertex buffer. We make this "hostWritable" to create this buffer in CPU-writable/GPU-readable memory. This is ultimately not how we want to create vertex buffers because having it reside in CPU memory makes it slower for the GPU to read, but this keeps things simpler, and we'll see how to upload a vertex buffer to VRAM in the next section. We also add an "srv" flag, telling D3D12Lite that we want a shader resource view for this buffer, which it will automatically add to the descriptor heap so that it can be indexed in shaders via ResourceDescriptorHeap[]. We make this a "raw access" buffer (ByteAddressBuffer) for how we've chosen to read it in the shader. With that, we have our vertex buffer, which we then copy our triangle vertices into with SetMappedData.

We need to know the index of that vertex buffer in the descriptor heap, so in the next lines of code we create a constant buffer and pass it the descriptor heap index of the vertex buffer.

The next lines are straightforward, we create our triangle vertex and pixel shaders and store them.

Next we get into the binding structure referenced earlier. What we're doing here is defining the storage of all resources used for a descriptor table (recall that this equates to an HLSL space in D3D12Lite). In this case, we just need a constant buffer. We then "Lock" the resource space, which just means that if we try to bind something to any slot of mTrianglePerObjectSpace that wasn't previously defined (when we built the PSO), it will assert. This ensures consistency between the D3D12 side and the HLSL side without needing reflection or other metadata. D3D12Lite only allows one CBV per space at register b0 (see Triangle.hlsl) to enforce healthy usage of constant buffer bindings. Several of these spaces may be used, but since we only need this per-object space here, we assign that to the PipelineResourceLayout and keep moving.

Finally, we create a graphics pipeline description (using the D3D12Lite default description function) and set both shaders, as well as the format of the render target. In this case, that's the backbuffer, which D3D12Lite creates render target views for in the DXGI_FORMAT_R8G8B8A8_UNORM_SRGB format. CreateGraphicsPipeline creates the pipeline and root signature in one call using the description and resource layouts we just made.

That's it for the setup! Now let's make a function to render this, which you'll see is just a few extra lines of code when compared with RenderClearColorTutorial.

void RenderTriangleTutorial()
{
    mDevice->BeginFrame();
 
    TextureResource& backBuffer = mDevice->GetCurrentBackBuffer();
 
    mGraphicsContext->Reset();
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_RENDER_TARGET);
    mGraphicsContext->FlushBarriers();
 
    mGraphicsContext->ClearRenderTarget(backBuffer, Color(0.3f, 0.3f, 0.8f));
 
    PipelineInfo pipeline;
    pipeline.mPipeline = mTrianglePSO.get();
    pipeline.mRenderTargets.push_back(&backBuffer);
 
    mGraphicsContext->SetPipeline(pipeline);
    mGraphicsContext->SetPipelineResources(PER_OBJECT_SPACE, mTrianglePerObjectSpace);
    mGraphicsContext->SetDefaultViewPortAndScissor(mDevice->GetScreenSize());
    mGraphicsContext->SetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
    mGraphicsContext->Draw(3);
 
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_PRESENT);
    mGraphicsContext->FlushBarriers();
 
    mDevice->SubmitContextWork(*mGraphicsContext);
 
    mDevice->EndFrame();
    mDevice->Present();
}

We give the backbuffer the same treatment we did before, but sandwiched in between the clear and the barrier, we draw our triangle. We start by binding the pipeline state object and backbuffer target, then bind the resource space we made at the PER_OBJECT_SPACE (#define perObjectSpace space0, from Common.hlsl), then set the viewport and triangle list topology states, and lastly draw our three triangle vertices. The rest ends the same as before. Replace the call to RenderClearColorTutorial with RenderTriangleTutorial in the Render() function, build and run, and you should see the following:

You'll see asserts and D3D12 validation warnings if you exit - we haven't cleaned up any of these new resources! After the call to WaitForIdle in the destructor, we'll destroy the resources.

mDevice->DestroyPipelineStateObject(std::move(mTrianglePSO));
mDevice->DestroyShader(std::move(mTriangleVertexShader));
mDevice->DestroyShader(std::move(mTrianglePixelShader));
mDevice->DestroyBuffer(std::move(mTriangleVertexBuffer));
mDevice->DestroyBuffer(std::move(mTriangleConstantBuffer));

Something to keep in mind as you move forward - D3D12Lite internally handles frame-buffered resource destruction. In this case we're destroying resources after idling the GPU so it doesn't matter, but consider a situation where you destroy a buffer while the GPU is running. If we did this naively and destroyed the resource immediately, the GPU might still be using that resource, in which case you would run into undefined behavior and likely crash the device. The most straightforward way to solve this is frame-buffered resource destruction, which means simply waiting N frames before actually freeing the resource, where N is the number of buffered frames (in this case 2). This guarantees the GPU is no longer using the resource when we free it, and D3D12Lite does this automatically.

Rendering a Mesh

This section will build on what we learned in the previous section, but removes a couple of shortcuts we took for the triangle, leaving us with a more honest practical use case. We need one extra resource from the tutorial zip, the Wood.dds texture, which you can simply drop in the same folder as your main.cpp. Start by taking a look at the code and comments in Mesh.hlsl. This will look mostly the same as the triangle, but with additional bindings, sampling a texture, and applying some simple lighting. Once you've had a look, we'll add some additional member variables to Renderer:

std::unique_ptr<TextureResource> mDepthBuffer;
std::unique_ptr<TextureResource> mWoodTexture;
std::unique_ptr<BufferResource> mMeshVertexBuffer;
std::array<std::unique_ptr<BufferResource>, NUM_FRAMES_IN_FLIGHT> mMeshConstantBuffers;
std::unique_ptr<BufferResource> mMeshPassConstantBuffer;
PipelineResourceSpace mMeshPerObjectResourceSpace;
PipelineResourceSpace mMeshPerPassResourceSpace;
std::unique_ptr<Shader> mMeshVertexShader;
std::unique_ptr<Shader> mMeshPixelShader;
std::unique_ptr<PipelineStateObject> mMeshPSO;

Most of this will look familiar, but let's go over some additions. We're introducing a depth buffer, which we'll use to write and test the depth of the cube mesh we're going to draw. We also have a new texture resource for the wood texture we're going to apply to the cube. You'll notice we have frame-buffered mesh constant buffers here, which we'll address soon. And the last important difference is we're going to have separate resource spaces for the per-object and per-pass resources (in this case, a constant buffer each, as seen in Mesh.hlsl).

As before, we'll create a new initialization function that we'll call from Renderer's constructor:

void InitializeMeshResources()
{
    MeshVertex meshVertices[36] = {
        {{1.0f, -1.0f, 1.0f}, {1.0f, 1.0f}, {0.0f, -1.0f, 0.0f}},
        {{1.0f, -1.0f, -1.0f}, {1.0f, 0.0f}, {0.0f, -1.0f, 0.0f}},
        {{-1.0f, -1.0f, 1.0f}, {0.0f, 1.0f}, {0.0f, -1.0f, 0.0f}},
        {{-1.0f, -1.0f, 1.0f}, {0.0f, 1.0f}, {0.0f, -1.0f, 0.0f}},
        {{1.0f, -1.0f, -1.0f}, {1.0f, 0.0f}, {0.0f, -1.0f, 0.0f}},
        {{-1.0f, -1.0f, -1.0f}, {0.0f, 0.0f}, {0.0f, -1.0f, 0.0f}},
        {{1.0f, 1.0f, -1.0f}, {1.0f, 1.0f}, {0.0f, 1.0f, 0.0f}},
        {{1.0f, 1.0f, 1.0f}, {1.0f, 0.0f}, {0.0f, 1.0f, 0.0f}},
        {{-1.0f, 1.0f, -1.0f}, {0.0f, 1.0f}, {0.0f, 1.0f, 0.0f}},
        {{-1.0f, 1.0f, -1.0f}, {0.0f, 1.0f}, {0.0f, 1.0f, 0.0f}},
        {{1.0f, 1.0f, 1.0f}, {1.0f, 0.0f}, {0.0f, 1.0f, 0.0f}},
        {{-1.0f, 1.0f, 1.0f}, {0.0f, 0.0f}, {0.0f, 1.0f, 0.0f}},
        {{-1.0f, -1.0f, -1.0f}, {1.0f, 1.0f}, {-1.0f, 0.0f, 0.0f}},
        {{-1.0f, 1.0f, -1.0f}, {1.0f, 0.0f}, {-1.0f, 0.0f, 0.0f}},
        {{-1.0f, -1.0f, 1.0f}, {0.0f, 1.0f}, {-1.0f, 0.0f, 0.0f}},
        {{-1.0f, -1.0f, 1.0f}, {0.0f, 1.0f}, {-1.0f, 0.0f, 0.0f}},
        {{-1.0f, 1.0f, -1.0f}, {1.0f, 0.0f}, {-1.0f, 0.0f, 0.0f}},
        {{-1.0f, 1.0f, 1.0f}, {0.0f, 0.0f}, {-1.0f, 0.0f, 0.0f}},
        {{-1.0f, -1.0f, 1.0f}, {1.0f, 1.0f}, {0.0f, 0.0f, 1.0f}},
        {{-1.0f, 1.0f, 1.0f}, {1.0f, 0.0f}, {0.0f, 0.0f, 1.0f}},
        {{1.0f, -1.0f, 1.0f}, {0.0f, 1.0f}, {0.0f, 0.0f, 1.0f}},
        {{1.0f, -1.0f, 1.0f}, {0.0f, 1.0f}, {0.0f, 0.0f, 1.0f}},
        {{-1.0f, 1.0f, 1.0f}, {1.0f, 0.0f}, {0.0f, 0.0f, 1.0f}},
        {{1.0f, 1.0f, 1.0f}, {0.0f, 0.0f}, {0.0f, 0.0f, 1.0f}},
        {{1.0f, -1.0f, 1.0f}, {1.0f, 1.0f}, {1.0f, 0.0f, 0.0f}},
        {{1.0f, 1.0f, 1.0f}, {1.0f, 0.0f}, {1.0f, 0.0f, 0.0f}},
        {{1.0f, -1.0f, -1.0f}, {0.0f, 1.0f}, {1.0f, 0.0f, 0.0f}},
        {{1.0f, -1.0f, -1.0f}, {0.0f, 1.0f}, {1.0f, 0.0f, 0.0f}},
        {{1.0f, 1.0f, 1.0f}, {1.0f, 0.0f}, {1.0f, 0.0f, 0.0f}},
        {{1.0f, 1.0f, -1.0f}, {0.0f, 0.0f}, {1.0f, 0.0f, 0.0f}},
        {{1.0f, -1.0f, -1.0f}, {1.0f, 1.0f}, {0.0f, 0.0f, -1.0f}},
        {{1.0f, 1.0f, -1.0f}, {1.0f, 0.0f}, {0.0f, 0.0f, -1.0f}},
        {{-1.0f, -1.0f, -1.0f}, {0.0f, 1.0f}, {0.0f, 0.0f, -1.0f}},
        {{-1.0f, -1.0f, -1.0f}, {0.0f, 1.0f}, {0.0f, 0.0f, -1.0f}},
        {{1.0f, 1.0f, -1.0f}, {1.0f, 0.0f}, {0.0f, 0.0f, -1.0f}},
        {{-1.0f, 1.0f, -1.0f}, {0.0f, 0.0f}, {0.0f, 0.0f, -1.0f}},
    };
 
    BufferCreationDesc meshVertexBufferDesc{};
    meshVertexBufferDesc.mSize = sizeof(meshVertices);
    meshVertexBufferDesc.mAccessFlags = BufferAccessFlags::gpuOnly;
    meshVertexBufferDesc.mViewFlags = BufferViewFlags::srv;
    meshVertexBufferDesc.mStride = sizeof(MeshVertex);
    meshVertexBufferDesc.mIsRawAccess = true;
 
    mMeshVertexBuffer = mDevice->CreateBuffer(meshVertexBufferDesc);
 
    auto bufferUpload = std::make_unique<BufferUpload>();
    bufferUpload->mBuffer = mMeshVertexBuffer.get();
    bufferUpload->mBufferData = std::make_unique<uint8_t[]>(sizeof(meshVertices));
    bufferUpload->mBufferDataSize = sizeof(meshVertices);
 
    memcpy_s(bufferUpload->mBufferData.get(), sizeof(meshVertices), meshVertices, sizeof(meshVertices));
 
    mDevice->GetUploadContextForCurrentFrame().AddBufferUpload(std::move(bufferUpload));
 
    mWoodTexture = mDevice->CreateTextureFromFile("Wood.dds");
 
    MeshConstants meshConstants;
    meshConstants.vertexBufferIndex = mMeshVertexBuffer->mDescriptorHeapIndex;
    meshConstants.textureIndex = mWoodTexture->mDescriptorHeapIndex;
 
    BufferCreationDesc meshConstantDesc{};
    meshConstantDesc.mSize = sizeof(MeshConstants);
    meshConstantDesc.mAccessFlags = BufferAccessFlags::hostWritable;
    meshConstantDesc.mViewFlags = BufferViewFlags::cbv;
 
    for (uint32_t frameIndex = 0; frameIndex < NUM_FRAMES_IN_FLIGHT; frameIndex++)
    {
        mMeshConstantBuffers[frameIndex] = mDevice->CreateBuffer(meshConstantDesc);
        mMeshConstantBuffers[frameIndex]->SetMappedData(&meshConstants, sizeof(MeshConstants));
    }
 
    BufferCreationDesc meshPassConstantDesc{};
    meshPassConstantDesc.mSize = sizeof(MeshPassConstants);
    meshPassConstantDesc.mAccessFlags = BufferAccessFlags::hostWritable;
    meshPassConstantDesc.mViewFlags = BufferViewFlags::cbv;
 
    Uint2 screenSize = mDevice->GetScreenSize();
 
    float fieldOfView = 3.14159f / 4.0f;
    float aspectRatio = (float)screenSize.x / (float)screenSize.y;
    Vector3 cameraPosition = Vector3(-3.0f, 3.0f, -8.0f);
 
    MeshPassConstants passConstants;
    passConstants.viewMatrix = Matrix::CreateLookAt(cameraPosition, Vector3(0, 0, 0), Vector3(0, 1, 0));
    passConstants.projectionMatrix = Matrix::CreatePerspectiveFieldOfView(fieldOfView, aspectRatio, 0.001f, 1000.0f);
    passConstants.cameraPosition = cameraPosition;
 
    mMeshPassConstantBuffer = mDevice->CreateBuffer(meshPassConstantDesc);
    mMeshPassConstantBuffer->SetMappedData(&passConstants, sizeof(MeshPassConstants));
 
    TextureCreationDesc depthBufferDesc;
    depthBufferDesc.mResourceDesc.Format = DXGI_FORMAT_D32_FLOAT;
    depthBufferDesc.mResourceDesc.Width = screenSize.x;
    depthBufferDesc.mResourceDesc.Height = screenSize.y;
    depthBufferDesc.mViewFlags = TextureViewFlags::srv | TextureViewFlags::dsv;
 
    mDepthBuffer = mDevice->CreateTexture(depthBufferDesc);
 
    ShaderCreationDesc meshShaderVSDesc;
    meshShaderVSDesc.mShaderName = L"Mesh.hlsl";
    meshShaderVSDesc.mEntryPoint = L"VertexShader";
    meshShaderVSDesc.mType = ShaderType::vertex;
 
    ShaderCreationDesc meshShaderPSDesc;
    meshShaderPSDesc.mShaderName = L"Mesh.hlsl";
    meshShaderPSDesc.mEntryPoint = L"PixelShader";
    meshShaderPSDesc.mType = ShaderType::pixel;
 
    mMeshVertexShader = mDevice->CreateShader(meshShaderVSDesc);
    mMeshPixelShader = mDevice->CreateShader(meshShaderPSDesc);
 
    GraphicsPipelineDesc meshPipelineDesc = GetDefaultGraphicsPipelineDesc();
    meshPipelineDesc.mVertexShader = mMeshVertexShader.get();
    meshPipelineDesc.mPixelShader = mMeshPixelShader.get();
    meshPipelineDesc.mRenderTargetDesc.mNumRenderTargets = 1;
    meshPipelineDesc.mRenderTargetDesc.mRenderTargetFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
    meshPipelineDesc.mDepthStencilDesc.DepthEnable = true;
    meshPipelineDesc.mRenderTargetDesc.mDepthStencilFormat = depthBufferDesc.mResourceDesc.Format;
    meshPipelineDesc.mDepthStencilDesc.DepthWriteMask = D3D12_DEPTH_WRITE_MASK_ALL;
 
    mMeshPerObjectResourceSpace.SetCBV(mMeshConstantBuffers[0].get());
    mMeshPerObjectResourceSpace.Lock();
 
    mMeshPerPassResourceSpace.SetCBV(mMeshPassConstantBuffer.get());
    mMeshPerPassResourceSpace.Lock();
 
    PipelineResourceLayout meshResourceLayout;
    meshResourceLayout.mSpaces[PER_OBJECT_SPACE] = &mMeshPerObjectResourceSpace;
    meshResourceLayout.mSpaces[PER_PASS_SPACE] = &mMeshPerPassResourceSpace;
 
    mMeshPSO = mDevice->CreateGraphicsPipeline(meshPipelineDesc, meshResourceLayout);
}

Again, much of this function is very similar to the triangle example with a few important exceptions, so let's cover those.

First is meshVertexBufferDesc which we're creating in VRAM unlike the triangle vertex buffer, accessible only to the GPU (BufferAccessFlags::gpuOnly). In order to get the vertices into this GPU-only buffer, we need to copy them from a host-writable buffer. The best way to do this is with the copy queue, via D3D12Lite's UploadContext. These contexts are unlike the others, as they are created internally by D3D12Lite and are self-managed. Work on the copy queue is asynchronous to work on the graphics queue, which is also asynchronous to the work on the async compute queue, and we'll see how we resolve this in the rendering function later on. To do this copy queue work, we create a BufferUpload, which we assign the target buffer, the intermediate vertex data, and the size of the data. We then push the buffer upload to the upload context for the current frame (as these are frame-buffered), and that's it for now. D3D12Lite will handle scheduling the copies, issuing them, and waiting on the work, all we'll need to do is make sure this work is complete before using the resources (more on this later).

Next we create the wood texture from the file we added. There is a lot more boilerplate involved in uploading a texture, so CreateTextureFromFile exists as a utility for reading the texture file, creating a TextureUpload, and submitting the work to the UploadContext.

The next important difference comes with mMeshConstantBuffers, which as noted before are frame-buffered. Since we intend to make the cube mesh rotate over time by updating a matrix in the constant buffer, we need one constant buffer for each frame we may have in flight. If we don't do this, we'll be overwriting the constants used for rendering the cube while it's potentially being read by the GPU, leading to undefined behavior (likely manifesting as stuttering rotation).

We skip this frame buffering for mMeshPassConstantBuffer, as we only set the camera position, view matrix, and projection matrix once. If you had a dynamic camera, then you would need to frame buffer this like mMeshConstantBuffers.

Next we create the depth buffer, using the texture creation description and creation function, the same size as the back buffer. The rest is fairly self-explanatory as it matches the triangle setup, with the exception that we are enabling depth testing and writing in the pipeline state object. With that done, we'll replace the previous function call in Render() with a call to this new function:

void RenderMeshTutorial()
{
    mDevice->BeginFrame();
 
    TextureResource& backBuffer = mDevice->GetCurrentBackBuffer();
 
    mGraphicsContext->Reset();
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_RENDER_TARGET);
    mGraphicsContext->AddBarrier(*mDepthBuffer, D3D12_RESOURCE_STATE_DEPTH_WRITE);
    mGraphicsContext->FlushBarriers();
 
    mGraphicsContext->ClearRenderTarget(backBuffer, Color(0.3f, 0.3f, 0.8f));
    mGraphicsContext->ClearDepthStencilTarget(*mDepthBuffer, 1.0f, 0);
 
    static float rotation = 0.0f;
    rotation += 0.0001f;
 
    if (mMeshVertexBuffer->mIsReady && mWoodTexture->mIsReady)
    {
        MeshConstants meshConstants;
        meshConstants.vertexBufferIndex = mMeshVertexBuffer->mDescriptorHeapIndex;
        meshConstants.textureIndex = mWoodTexture->mDescriptorHeapIndex;
        meshConstants.worldMatrix = Matrix::CreateRotationY(rotation);
 
        mMeshConstantBuffers[mDevice->GetFrameId()]->SetMappedData(&meshConstants, sizeof(MeshConstants));
 
        mMeshPerObjectResourceSpace.SetCBV(mMeshConstantBuffers[mDevice->GetFrameId()].get());
 
        PipelineInfo pipeline;
        pipeline.mPipeline = mMeshPSO.get();
        pipeline.mRenderTargets.push_back(&backBuffer);
        pipeline.mDepthStencilTarget = mDepthBuffer.get();
 
        mGraphicsContext->SetPipeline(pipeline);
        mGraphicsContext->SetPipelineResources(PER_OBJECT_SPACE, mMeshPerObjectResourceSpace);
        mGraphicsContext->SetPipelineResources(PER_PASS_SPACE, mMeshPerPassResourceSpace);
        mGraphicsContext->SetDefaultViewPortAndScissor(mDevice->GetScreenSize());
        mGraphicsContext->SetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
        mGraphicsContext->Draw(36);
    }
 
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_PRESENT);
    mGraphicsContext->FlushBarriers();
 
    mDevice->SubmitContextWork(*mGraphicsContext);
 
    mDevice->EndFrame();
    mDevice->Present();
}

In addition to our back buffer barrier and clear, we now also barrier and clear the depth buffer. Note the grouping of the barriers before flushing, as this is better for performance. We keep a static variable for the rotation and increment it each frame - this is just for demonstration purposes, you would want to base this on the frame time so that it doesn't go faster or slower based on your framerate.

Next we reach the end of our resource upload pipeline, which is checking mIsReady on our vertex buffer and texture. D3D12Lite will (inside of Device::BeginFrame) set mIsReady to true for any resource that has completed an upload and is now safe to use. UploadContexts contain a fixed amount of upload memory that they consume each frame, so they may not complete all possible uploads on a given frame, nor may their work finish executing before a draw on the same frame. If we decided to ignore mIsReady, we may end up using a resource without any uploaded data in it, leading to undefined behavior. Here, we simply skip the cube drawing if these resources aren't yet uploaded.

The last new addition is that we need to apply the constants to our frame-buffered mesh constant buffers. We get the current frame ID from the device to set the data for this frame's constant buffer, and apply the latest rotation to its world matrix. We then assign this current constant buffer to the resource space we're about to bind. This combination will safely ping-pong our constant buffers each frame, ensuring we never write to data that the GPU is actively using.

Finally we bind our resources, including our new depth buffer, and draw the cube! If you run the application, you should see a lit spinning cube. As always, don't forget to free these new resources at the end!

If you've made it this far, congratulations! You now have the tools to do all sorts of graphics work. Be sure to dig into D3D12Lite to see how things are working behind the scenes, this will help you achieve a firm grasp on the D3D12 API.

Adding ImGui

No personal renderer is complete without Dear ImGui! D3D12Lite is set up to make the integration with the stock Windows/D3D12 ImGui files easy. Start by #including "imgui/imgui.h", "imgui/imgui_impl_win32.h", and "imgui/imgui_impl_dx12.h", and then let's look at the setup function, which we'll call from Renderer's constructor:

void InitializeImGui(HWND windowHandle)
{
    IMGUI_CHECKVERSION();
    ImGui::CreateContext();
    ImGuiIO& io = ImGui::GetIO();
    ImGui::StyleColorsDark();
 
    Descriptor descriptor = mDevice->GetImguiDescriptor(0);
    Descriptor descriptor2 = mDevice->GetImguiDescriptor(1);
 
    ImGui_ImplWin32_Init(windowHandle);
    ImGui_ImplDX12_Init(mDevice->GetDevice(), NUM_FRAMES_IN_FLIGHT,
        DXGI_FORMAT_R8G8B8A8_UNORM_SRGB, nullptr,
        descriptor.mCPUHandle, descriptor.mGPUHandle, descriptor2.mCPUHandle, descriptor2.mGPUHandle);
}

Dear ImGui's stock implementation requires a descriptor from each frame's descriptor heap for the font texture, and D3D12Lite has these available for you via GetImguiDescriptor. Next, above the WndProc, add this:

extern IMGUI_IMPL_API LRESULT ImGui_ImplWin32_WndProcHandler(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);

And right at the top of the WndProc function, add this, which will allow ImGui to process input:

if (ImGui_ImplWin32_WndProcHandler(hwnd, umessage, wparam, lparam))
{
    return true;
}

That's all we need for the setup, so let's replace the function in Render() with a call to this new function:

void RenderImGui()
{
    mDevice->BeginFrame();
 
    ImGui_ImplDX12_NewFrame();
    ImGui_ImplWin32_NewFrame();
    ImGui::NewFrame();
 
    ImGui::ShowDemoWindow();
    ImGui::Render();
 
    TextureResource& backBuffer = mDevice->GetCurrentBackBuffer();
 
    mGraphicsContext->Reset();
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_RENDER_TARGET);
    mGraphicsContext->FlushBarriers();
 
    mGraphicsContext->ClearRenderTarget(backBuffer, Color(0.3f, 0.3f, 0.8f));
 
    PipelineInfo pipeline;
    pipeline.mPipeline = nullptr;
    pipeline.mRenderTargets.push_back(&backBuffer);
 
    mGraphicsContext->SetPipeline(pipeline);
    ImGui_ImplDX12_RenderDrawData(ImGui::GetDrawData(), mGraphicsContext->GetCommandList());
 
    mGraphicsContext->AddBarrier(backBuffer, D3D12_RESOURCE_STATE_PRESENT);
    mGraphicsContext->FlushBarriers();
 
    mDevice->SubmitContextWork(*mGraphicsContext);
 
    mDevice->EndFrame();
    mDevice->Present();
}

Once again, most things are exactly the same as before. We add a few NewFrame functions towards the top to let ImGui do its bookkeeping, and then we add our ImGui code for the frame, in this case ShowDemoWindow which showcases just how powerful and versatile ImGui can be. We then call ImGui::Render to have it build draw commands for the ImGui content we're going to draw. When it comes time to bind the pipeline, we bind a null PSO, as ImGui is going to do this internally on its own. We call ImGui_ImplDX12_RenderDrawData, and that's it! When you run, you'll notice the UI will not appear as dark as demoed on their github, and this is because we're rendering with an SRGB backbuffer, which stock ImGui does not handle. In the destructor, make sure to call ImGui_ImplDX12_Shutdown() and ImGui_ImplWin32_Shutdown() so that ImGui can clean up its internal resources. Take it for a spin, Dear ImGui will no doubt be one of the best tools in your toolbox for as long as you do graphics work. If you have the ability to support them financially in some way, I highly encourage doing so!

Wrapping Up

D3D12Lite is not a complete implementation of D3D12 features, and there may be bugs in the less tested paths. If you find it useful, by all means run with it and expand it to fit your usage. In any case, setting breakpoints and stepping through a frame should prove helpful in learning how you can use D3D12. The wrapper itself is only around 2000 lines of code, and so is hopefully easy to digest! Best of luck in your journey forward as a graphics programmer.

Special thank you to François Guthmann who volunteered to take this for a test drive ahead of time.

Lastly, a sincere thank you to the developers of D3D12 and the associated libraries I have referenced here. I have genuinely enjoyed my time working with D3D12, both personally and professionally.

Alex Tardif
Graphics & Game Programmer