Wave intrinsics, SM6+, and DXC in Unity

I was recently wanting to mess around with wave intrinsics in a Unity context and found documentation to be pretty sparse so I wanted to collect my findings in a nice, central location. The info in this article mainly comes from this semi-internal Unity documentation and my own experimentation.

What are wave intrinsics?

GPUs always run your shaders in groups of threads called waves. (Sometimes referred to as wavefronts in an AMD context, warps in an NVIDIA context, thread subgroups in a Vulkan/OpenGL context, or SIMD-groups in a Metal context. Good teamwork, everyone!)

Typical modern AMD hardware uses groups of 64, NVIDIA uses groups of 32. Not all threads are active all the time, for example if you draw a 4 vertex mesh you can expect 4 threads in a wave to be active for running your vertex shader. The other 60 or 28 threads are inactive and would essentially do nothing.

(This grouping is why you typically want the numthreads for your compute shaders to be a multiple of 64.)

Wave intrinsics provide a mechanism for very basic communication between multiple threads within a wave. They’re most useful in compute shaders, but you can use them in any shader stage you like.

You can also use them to read values from pixels in the same pixel quad without using partial derivatives (IE: ddx/ddy/etc.)

Utilizing wave intrinsics is beyond the scope of the article. Here’s some further reading if you want to learn more:

Historically Unity has used FXC to compile shaders. FXC has not received any meaningful updates since 2015, and was superseded by DXC in 2017.

DXC is required to use features from Shader Model 6 and later.

As an added bonus DXC is also significantly (3-7x) faster compared to FXC. In my opinion it’s worth learning about using DXC in Unity purely to improve your shader iteration workflow.

Thankfully Unity has (experimental) support for using DXC instead of FXC!

Limitations of using DXC with Unity

My personal opinion

Using DXC with Unity should be fine if you’re only targeting Windows and Direct3D 12.

DXC is also probably fine for use with Vulkan/Metal as long as you’re properly testing those platforms. I’d be less inclined to assume it’ll just work though. I would definitely not use DXC for hull/domain shaders on Vulkan/Metal unless you find a big perf win to justify the risk.

(Since using DXC is opt-in on a per-shader basis you can continue using FXC for any shaders where you don’t need the new features.)

I would hesitate to use DXC when you intend to continue supporting Direct3D 11. Unity needs to resolve some rough edges for projects targeting both 11 and 12 at the same time. (Evaluate whether you really need D3D11 though. D3D12 has been supported by consumer GPUs since ~2012.)

Even if you can’t swallow the risk of using DXC in production, you should definitely consider using it during development to improve your shader iteration experience.

Using DXC with Unity

Using DXC is a matter of ensuring you’re using a compatible graphics API and then enabling it for shaders where you want to use it.

Changing your graphics API

The first thing we have to do is run the editor or our game using a modern graphics API. You can do this on a temporary basis or by changing the project-wide default.

Temporarily for testing

To temporarily change the graphics API for testing or shader iteration purposes, we can force an API using the Unity Editor command line arguments.

In Unity Hub go to the Projects tab and click the ... next to the project you’re working on and select Add command line arguments.

Enter the appropriate argument for your platform:

Unity Hub command line window showing the configuration for Windows.

Save your changes and open your project as you would normally.

You should see <DX12>, <Vulkan>, or <Metal> in the title bar of the editor as seen below:

Screenshot of Unity editor titlebar stating "UnityWaveIntrinsicsTest - SampleScene - Windows, Mac, Linux - Unity 2022.1.16f1 <DX12>"

The same command line arguments work with the player provided that you didn’t disable D3D12/Vulkan/Metal in the player settings.

Permanently for builds and editing

If you’re planning on using DXC in production you’ll want to configure your project to restrict your graphics API selection to ones compatible with DXC.

You might also need to do this if you want to be able to test player builds with these modern APIs if they were previously disabled.

To do this go to Edit > Project Settings.... Go to the Player category and find the Other Settings section (should be last). Inside that find Rendering (should be first) and uncheck Auto Graphics API for Windows/Mac/Linux and configure them to use modern graphics APIs as shown below:

Player project settings with the above applied showing Windows configured to use "Direct3D12 (Experimental)" with "Vulkan" as a fallback. Mac is configured explicitly to use "Metal" alone. Linux is configured with "Vulkan" alone.

In older versions of Unity it is possible to avoid having Vulkan listed for Windows, but older versions of Unity force you to include it since Direct3D 12 support used to be experimental (as in the screenshot above.)

What happens if someone forces an API not in these lists?

If a customer attempts to force an API (which I see recommended as a troubleshooting step pretty frequently) and it’s not in those lists they’ll get an error stating InitializeEngineGraphics failed upon launch:

Error message stating "Failed to initialize player" with details stating "InitializeEngineGraphics failed"

This is a good thing since things break in mysterious ways if older graphics APIs try to load FXC shaders. You don’t want those older APIs being used unless you go out of your way to continue support for them.

Enabling DXC in HLSL

To use wave intrinsics and other shader model 6 features, you need to explicitly request support for the feature group you’re interested in using via #pragma require:

Feature GroupFunctionality
WaveBasicWaveIsFirstLane
WaveGetLaneCount
WaveGetLaneIndex
WaveVoteWaveActiveAnyTrue
WaveActiveAllTrue
WaveActiveAllEqual
WaveBallotWaveActiveBallot
WaveReadLaneFirst
WaveReadLaneAt
WaveActiveCountBits
WavePrefixCountBits
WaveMathWaveActiveSum
WaveActiveProduct
WaveActiveBitAnd
WaveActiveBitOr
WaveActiveBitXor
WaveActiveMin
WaveActiveMax
WavePrefixSum
WavePrefixProduct
WaveMultiPrefix(2)WaveMatch
WaveMultiPrefixSum
WaveMultiPrefixProduct
WaveMultiPrefixCountBits
WaveMultiPrefixAnd
WaveMultiPrefixOr
WaveMultiPrefixXor
QuadShuffleQuadReadAcrossX
QuadReadAcrossY
QuadReadAcrossDiagonal
QuadReadLaneAt
Int64int64_tN
uint64_tN
Int64BufferAtomics(1)InterlockedXYZ on int64 in RWByteAddressBuffer
Int64GroupsharedAtomics(1)InterlockedXYZ on int64 in groupshared
Native16Bit(1)float16_tN
int16_tN
uint16_tN
Barycentrics(1)SV_Barycentrics
GetAttributeAtVertex(2)

(1) Note that these features do not have widespread desktop GPU support yet. None of them work on my GTX 1080 (Pascal - 2016), which is old but not that old.
If you use any of these feature groups make sure you provide a fallback shader or subshader!

(2) Direct3D only. Not supported on Vulkan or Metal!

(N) denotes a vector type. IE: int16_tN can be int16_t, int16_t1, int16_t2, int16_t3, or int16_t4.

What are those non-wave/quad features?

While wave intrinsics are the star of the show here, there are several other features that have been added in Shader Model 6.0 and later.

You can read about them in the DXC wiki.

The features currently supported by Unity are SV_Barycentrics, native 16-bit types (read this article and part 2 before using), and 64-bit integers.

Note that other SM6+ features not listed here might be usable on Direct3D 12, but they probably won’t work on Vulkan or Metal.

Requiring any of these features will cause Unity to automatically use DXC to build your shaders and targets them to the appropriate Vulkan or Metal feature level.

You might notice that sometimes functionality from one category might start working as soon as you require another. This happens because some graphics APIs combine categories under the hood (the above categories are specifically based on Vulkan extensions.) You should generally not rely on this and should always #pragma require what you use.


Once you find the category of the functionality you want to use, just place the relevant #pragma require inside your HLSLPROGRAM block. You can also place it in HLSLINCLUDE block to apply it to all passes.

For compute shaders just put it somewhere in the global scope.

Here’s an example shader that visualizes wave occupancy using WaveActiveCountBits and WaveGetLaneCount, which come from the WaveBallot and WaveBasic groups respectively:

Shader "Unlit/WaveIntrinsicExample"
{
    SubShader
    {
        Tags { "RenderType" = "Opaque" }

        Pass
        {
            HLSLPROGRAM
            #include "UnityCG.cginc"
            #pragma require WaveBallot // WaveActiveCountBits
            #pragma require WaveBasic // WaveGetLaneCount

            #pragma vertex VertexMain
            float4 VertexMain(float3 position : POSITION) : SV_Position
            {
                return UnityObjectToClipPos(position);
            }

            #pragma fragment PixelMain
            float4 PixelMain() : SV_Target
            {
                return float4((float)WaveActiveCountBits(true) / (float)WaveGetLaneCount(), 0.f, 0.f, 1.f);
            }
            ENDHLSL
        }
    }
}
The output of the above shader applied to a cube
Full red is full occupancy, dark reds are low occupancy

The output of the above shader applied to a cube
Full red is full occupancy, dark reds are low occupancy

Enabling DXC in Shader Graph

While not officially supported, it’s possible to use DXC with Shader Graph as well. You’ll understandably need custom functions to access wave intrinsics.

All you need to do is add the relevant #pragma require from above to your custom functions. It works as expected for both file-based and string-based custom functions.

If you have multiple custom functions utilizing wave intrinsics it’s best to add the pragma to each one of them in order to ensure the node previews render correctly. (If you don’t they’ll turn into angry magenta checkerboards.)

Here’s an example shader which visualizes the index of the thread within the wave used to shade each pixel:

Wave lane index visualization shader created in Shader Graph.

WaveActiveCountBitsNode uses a custom function defined in a string (shown in the graph inspector above.)

WaveGetLaneIndexNode uses a custom function defined in a file with the following contents:

#pragma once
#pragma require WaveBasic // WaveGetLaneIndex

void WaveGetLaneIndexNode_float(out float result)
{
    result = (float)WaveGetLaneIndex();
}

Explicitly enabling DXC without using SM6+ features

If you’re using DXC primarily for the improved compilation time, you’ll want to enable DXC without requiring any of the shader model 6 features.

You can do so using #pragma use_dxc in your shader instead of #pragma require XYZ.

You might notice that this alone allows you to use wave intrinsics and some other SM6 features on Direct3D 12. However you’ll eventually find that this isn’t enough to enable things for Vulkan or Metal.

For example, a use_dxc shader which calls WaveGetLaneCount will give you the following error on Vulkan:

Shader error in 'YourShaderName': Vulkan 1.1 is required for Wave Operation but not permitted to use at line 42 (on vulkan)

and a similar one for Metal:

Shader error in 'YourShaderName': DXC SPIRV-Cross error: threads_per_simdgroup requires Metal 2.2 in fragment shaders.

Additionally, use_dxc alone can interfere with Unity’s ability to switch to fallback shaders. (See the D3D11 section at the end of this article for details.)

As such when you use SM6+ features, I’d recommend you stick to using #pragma require to explicitly declare each feature set you’re using. use_dxc alone is just creating a ticking time bomb of tedious work when you go to port your game.

Explicitly enabling DXC in Shader Graph

Utilizing #pragma use_dxc in Shader Graph is similar, except you probably don’t have a custom function where it makes sense to place the pragma.

The easiest workaround is to create a no-op custom function node somewhere in your graph with the pragma to ensure it’s included in the generated shader:

Shader Graph shader with DXC explicitly enabled.

Ensuring DXC was actually used

If you’re using #pragma use_dxc and you want to double check that DXC was actually used, you can use the Compile and show code tool in the shader asset inspector to view the disassembly for Direct3D. (Use the dropdown to select the D3D target if necessary.)

In the generated output, search for target triple = "dxil-ms-dx". If you find that line, DXC was used to compile at least one of the shader’s subshaders.

Conversely you can find shaders built with FXC by looking for the phrase D3D Shader Disassembler.

Note that you might find both in some shaders you since each subshader and entry point is built separately, and not all of them have to use the same compiler.

Other info

Reacting to DXC with conditional compilation

If you need to detect whether your code is being built with DXC or FXC, you can use the UNITY_COMPILER_DXC preprocessor macro to detect it:

#ifndef UNITY_COMPILER_DXC
#error This file must be only be included when using DXC! See https://pixelalchemy.dev/posts/wave-intrinsics-in-unity/ for details.
#endif

Note that because Unity uses its own shader preprocessor you cannot use __hlsl_dx_compiler, __DXC_VERSION_MAJOR, or any other of DXC’s predefined preprocessor macros.

Debugging DXC shaders with PIX

Debugging shaders with PIX works the same as it usually does.

Despite the naming, #pragma enable_d3d11_debug_symbols works with DXC/D3D12 as well.

Info for games targeting both D3D11 and D3D12

As mentioned earlier, some workarounds are necessary if you intend to continue supporting Direct3D 11.

The main cause of this is that Unity internally uses the same shader blob for D3D11 and D3D12. This is why there’s isn’t a built-in SHADER_API_D3D12 define and why you can’t specify d3d12 as a graphics API in API-specific configuration pragmas.

Because of this you can’t easily use different shader code for each, and you can’t directly use DXC for D3D12 and FXC for D3D11.

There are a couple different workarounds for this that I’ve found, sadly none are super ideal.

What happens if you try to load a DXC shader in D3D11

Unity will not print any messages to the console.

For graphics shaders you’ll find the following error in the player log:

D3D shader create error for vertex shader [0x80070057]

Additionally if you have graphics jobs enabled (the default) the following message will be spammed continuously in the player log:

ShaderProgram is unsupported, but because jobified rendering is enabled the ShaderProgram can not be removed.

Unfortunately with compute shaders you will get no feedback whatsoever. Dispatching the kernel will simply do nothing.

Using subshaders or fallback shaders

If you use #pragma require syntax as recommended above you can use subshaders or a fallback shader to provide FXC-compatible implementations of your shader to be used on D3D11.

Unity will automatically choose the FXC version since the requirements of the DXC subshader physically cannot be met by D3D11. (IE: D3D11 does not support SM6+ features so it will never satisfy the #pragma require directives.)

Unfortunately this is not an option for compute shaders.

Do not attempt to rely on this if you’re using #pragma use_dxc. Unity will try and fail to use the DXC shader on D3D11 anyway. If graphics jobs are enabled the log message from earlier is spammed and Unity never moves on to the next shader. Affected objects will not render.

(If you disable graphics jobs Unity will skip to the next viable shader after failing to use the DXC version, but I would not disable graphics jobs purely for this purpose.)

Using global keywords

A more robust and universal solution is to use a global shader keyword to toggle between D3D11 and D3D12 implementations.

I was going to recommend this as the ideal solution but it unfortunately does not work with compute shaders. Despite supposedly being supported by the new shader preprocessor, pragmas cannot be controlled by conditional compilation in compute shaders specifically.

Another downside of this approach is that you’ll end up with FXC and DXC variants of your shaders for any non-D3D platforms you might support. (In theory this will end up slowing down builds and bloating your app size slightly with shader variants that are never used.)

(You might be tempted to try guarding your #pragma multi_compile with conditional compilation, but it won’t actually work. The keyword and its variants will still be built.)

That being said, if you want to use this for your graphics shaders you can use this little script to toggle a global keyword upon application/editor startup:

using UnityEngine;
using UnityEngine.Rendering;

internal static class DownlevelShaderHelper
{
#if UNITY_EDITOR
    [UnityEditor.InitializeOnLoadMethod]
#endif
    [RuntimeInitializeOnLoadMethod(RuntimeInitializeLoadType.SubsystemRegistration)]
    private static void Configure()
    {
        if (SystemInfo.graphicsDeviceType == GraphicsDeviceType.Direct3D11)
        {
#if !UNITY_EDITOR // Avoid spamming the console in editor
            Debug.Log("Running on Direct3D 11, enabling shader compatibility mode...");
#endif
            // Applies to compute shaders too, but compute shaders can't really use this as intended
            Shader.EnableKeyword("D3D11_FALLBACK");
        }
    }
}

Specifying RuntimeInitializeLoadType.SubsystemRegistration here is important to ensure this happens before any shaders are loaded.

Example shader

Here’s how using the keyword looks in a shader. This material will be green on D3D12 and red on D3D11:

Shader "Unlit/KeywordFallbackTest"
{
    SubShader
    {
        Tags { "RenderType" = "Opaque" }

        Pass
        {
            HLSLPROGRAM
            #include "UnityCG.cginc"
            #pragma multi_compile _ D3D11_FALLBACK

            #ifndef D3D11_FALLBACK
            #pragma use_dxc
            #endif

            #pragma vertex VertexMain
            float4 VertexMain(float3 position : POSITION) : SV_Position
            {
                return UnityObjectToClipPos(position);
            }

            #pragma fragment PixelMain
            float4 PixelMain() : SV_Target
            {
            #ifdef D3D11_FALLBACK
                return float4(1.f, 0.f, 0.f, 1.f);
            #else
                return float4(0.f, 1.f, 0.f, 1.f);
            #endif
            }
            ENDHLSL
        }
    }
}
Why doesn’t this work with compute shaders?

Unity seems to have a bug with preprocessor conditionals not affecting pragmas in compute shaders specifically.

For example, you’d expect the following shader to get compiled using FXC when D3D11_FALLBACK is enabled and DXC otherwise.

Unfortunately this shader will actually always be compiled with DXC. Unity completely ignores the #ifndef D3D11_FALLBACK conditional as far as pragma processing goes.

#pragma kernel Main

// THIS WILL NOT WORK AS EXPECTED
#ifndef D3D11_FALLBACK
#pragma require WaveBasic
#endif

RWByteAddressBuffer Result;

[numthreads(1,1,1)]
void Main(uint3 id : SV_DispatchThreadID)
{
#ifdef D3D11_FALLBACK
    Result.Store(0, 11);
    Result.Store(4, -1);
#else
    Result.Store(0, 12);
    Result.Store(4, WaveGetLaneCount());
#endif
}

Just compile it twice™

Since compute shaders are typically dispatched from your C# code you can pretty easily use SystemInfo.graphicsDeviceType to switch between different compute shader implementations:

ComputeShader compute = SystemInfo.graphicsDeviceType == GraphicsDeviceType.Direct3D11 ? ComputeFxc : ComputeDxc;

If you structure things just right it’s barely any more effort compared to using keywords:

// ComputeShaderDxc.compute
#pragma require WaveBasic
#include "ComputeShaderFxc.compute"
// ComputeShaderFxc.compute
#pragma kernel Main

RWByteAddressBuffer Result;

[numthreads(1,1,1)]
void Main(uint3 id : SV_DispatchThreadID)
{
#ifdef UNITY_COMPILER_DXC
    Result.Store(0, 12);
    Result.Store(4, WaveGetLaneCount());
#else
    Result.Store(0, 11);
    Result.Store(4, -1);
#endif
}

The unfortunate downside here is that it moves your #pragma require directives further away from where you actually use the intrinsics.

Closing thoughts

Hope you found all that helpful!

This is my first time really taking my personal notes and polishing them into a coherent article meant for public consumption. If you have any polite questions or feedback I am most easily reached via Twitter.

If you’re looking for your very own David to evaluate new technologies for suitability of use in your game, I’m currently looking for work! So please don’t hesitate to get in touch.