Microsoft officially released today a new DirectX 12 feature that’s been available for some time in preview introducing new types of GPU autonomy that attempt to eliminate CPU bottlenecks.
In a lengthy blog post, Amar Patel, Engineer (Direct3D), and Tex Riddell, Engineer (DirectX Compiler) provide an explanation for Work Graphs, a system for GPU autonomy in D3D12 that aim to address limitations in general compute workloads on GPUs, and unlock latent GPU capabilities. In simpler terms, the new system aims to switch to a more efficient GPU-driven rendering system, reducing the need to use the CPU in different workloads.
In many GPU workloads, an initial calculation on the GPU determines what subsequent work the GPU needs to do. This can be accomplished with a round trip back to the CPU to issue the new work. But it is typically better for the GPU to be able to feed itself directly. ExecuteIndirect in D3D12 is a form of this, where the app uses the GPU to record a very constrained command buffer that needs to be serially processed on the GPU to issue new work.
Consider a new option. Suppose shader threads running on the GPU (producers) can request other work to run (consumers). Consumers can be producers as well. The system can schedule the requested work as soon as the GPU has capacity to run it. The app can also let the system manage memory for the data flowing between tasks.
This is Work Graphs. A graph of nodes where shader code at each node can request invocations of other nodes, without waiting for them to launch. Work graphs capture the user’s algorithmic intent and overall structure, without burdening the developer to know too much about the specific hardware it will run on. The asynchronous nature maximizes the freedom for the system to decide how best to execute the work.
The full details on the DirectX 12 Work Graphs feature can be found here.