Sunday, April 4, 2010

Simple Compute Shader Example

The other big side of DirectX 11 is the Compute Shader. I thought I would write up a very simple example along the same lines as my tessellation example.

First let me say that the Compute Shader is awesome! It opens up so many possibilities. My mind is just reeling with new ideas to try out. Also I must mention that SlimDX really does a great job of minimalizing the code necessary to use the Compute Shader.

This example shows how to create a Compute Shader and then use it to launch threads that simply output the thread ID to a texture.

Device device = new Device(DriverType.Hardware, DeviceCreationFlags.Debug, FeatureLevel.Level_11_0);
 
ComputeShader compute = Helper.LoadComputeShader(device, "SimpleCompute.hlsl", "main");
 
Texture2D uavTexture;
UnorderedAccessView computeResult = Helper.CreateUnorderedAccessView(device, 1024, 1024, Format.R8G8B8A8_UNorm, out uavTexture);
 
device.ImmediateContext.ComputeShader.Set(compute);
device.ImmediateContext.ComputeShader.SetUnorderedAccessView(computeResult, 0);
device.ImmediateContext.Dispatch(32, 32, 1);
 
Texture2D.ToFile(device.ImmediateContext, uavTexture, ImageFileFormat.Png, "uav.png");


Believe it or not, but that is the entirety of my CPU code.

Here is what is going on in the code:
1) Create a feature level 11 Device, in order to use Compute Shader 5.0
2) Load/Compile the HLSL code into a ComputeShader object.
3) Create a 1024x1024 UnorderedAccesdView (UAV) object which will be used to store the output.
4) Set the ComputeShader and UAV on the device.
5) Run the Compute Shader by calling Dispatch (32x32x1 thread groups are dispatched).
6) Save the output texture out to disk.

My HLSL code is even simpler:

RWTexture2D<float4> Output;

[numthreads(32, 32, 1)]
void main( uint3 threadID : SV_DispatchThreadID )
{
Output[threadID.xy] = float4(threadID.xy / 1024.0f, 0, 1);
}


As you can see a RWTexture2D object is used to store the output (this is the UAV). The shader is setup to run 32x32x1 threads. This means that since the CPU is launching 32x32x1 thread groups, then there are 1024x1024x1 separate threads being run. This equates to 1 thread per pixel in the output UAV. So, in the UAV, the color is just set based upon the thread ID.

This code results in the following output image:


Quite simple, eh? But not that interesting. We could easily do something like that with a pixel shader (although we would have to rasterize a full-screen quad to do it).

We should try to do something that shows the power of the compute shader; something you couldn't do in a pixel shader before. How about drawing some primitives like lines and circles?

For drawing lines, let's use the Digital Differential Analyzer algorithm. It translates to HLSL very easily.


void Plot(int x, int y)
{
Output[uint2(x, y)] = float4(0, 0, 1, 1);
}

void DrawLine(float2 start, float2 end)
{
float dydx = (end.y - start.y) / (end.x - start.x);
float y = start.y;
for (int x = start.x; x <= end.x; x++)
{
Plot(x, round(y));
y = y + dydx;
}
}


For drawing circles let's use the Midpoint Circle algorithm. For brevity I won't list it here now.

Then, in my Compute Shader main function, I simply add this code:

if (threadID.x == 1023 && threadID.y == 1023)
{
DrawLine(float2(0, 0), float2(1024, 1024));
DrawLine(float2(0, 1023), float2(1023, 0));

DrawCircle(512, 512, 250);
DrawCircle(0, 512, 250);
}


The if check is just done to prevent the lines and circles from being drawn for every thread. This code results in the following image:


I must admit it seems quite odd writing a shader that draws primitives. It's like some strange recursive loop. But it definitely helps to illustrate the features of the Compute Shader and how powerful it is.

You may download the source code to this example here:
ComputeShader11.zip

My next goal is to setup a standard DX11 Swap Chain and use the Compute Shader to write directly to the backbuffer. Well that's all for now.

FYI: This is my 50th blog post! I never thought I would continue on this long. I think I should crack open a beer to celebrate.

6 comments:

Antoine Leblond said...

Nice entry.

Can you post the sample code? I always get a non-descript error when I do ComputeShader cs = new ComputeShader(device, shaderBytecode); and I would like to compare with your code.

Thanks!

Patrick said...

There really isn't much more to the code, which is why I didn't post the full project before. But since you asked for it, I went ahead and posted it. You can download it here:
ComputeShader11.zip

By the way, are you sure you are compiling against "cs_5_0" and NOT "fx_5_0"? That seems to be a common problem.

Let me know if you need anymore help.

Antoine Leblond said...

Thanks for the quick reply and the sample code.

The non-descript error was because I was compiling against "fx_5_0". Thanks! Then I encountered a "'main': entrypoint not found" error... That was because my hlsl file was not in "ansi" encoding (http://forums.silverlight.net/forums/p/81994/192533.aspx)...

It finally works! ... for now. :)

Thanks again

Anonymous said...

This was really helpful for me! Thanks for writing this great tutorial!

Danix said...

I tried to run your code but unfortunately using the latest SlimDX release many things have changed and it doesn't even compile. What should I do to have it working? (e.g. ShaderBytecode seems to be moved in a DX9 namespace??)

Danix said...

nevermind I got it working with minimal effort...