First let me say that the Compute Shader is awesome! It opens up so many possibilities. My mind is just reeling with new ideas to try out. Also I must mention that SlimDX really does a great job of minimalizing the code necessary to use the Compute Shader.
This example shows how to create a Compute Shader and then use it to launch threads that simply output the thread ID to a texture.
Device device = new Device(DriverType.Hardware, DeviceCreationFlags.Debug, FeatureLevel.Level_11_0);
ComputeShader compute = Helper.LoadComputeShader(device, "SimpleCompute.hlsl", "main");
UnorderedAccessView computeResult = Helper.CreateUnorderedAccessView(device, 1024, 1024, Format.R8G8B8A8_UNorm, out uavTexture);
device.ImmediateContext.Dispatch(32, 32, 1);
Texture2D.ToFile(device.ImmediateContext, uavTexture, ImageFileFormat.Png, "uav.png");
Believe it or not, but that is the entirety of my CPU code.
Here is what is going on in the code:
1) Create a feature level 11 Device, in order to use Compute Shader 5.0
2) Load/Compile the HLSL code into a ComputeShader object.
3) Create a 1024x1024 UnorderedAccesdView (UAV) object which will be used to store the output.
4) Set the ComputeShader and UAV on the device.
5) Run the Compute Shader by calling Dispatch (32x32x1 thread groups are dispatched).
6) Save the output texture out to disk.
My HLSL code is even simpler:
[numthreads(32, 32, 1)]
void main( uint3 threadID : SV_DispatchThreadID )
Output[threadID.xy] = float4(threadID.xy / 1024.0f, 0, 1);
As you can see a RWTexture2D object is used to store the output (this is the UAV). The shader is setup to run 32x32x1 threads. This means that since the CPU is launching 32x32x1 thread groups, then there are 1024x1024x1 separate threads being run. This equates to 1 thread per pixel in the output UAV. So, in the UAV, the color is just set based upon the thread ID.
This code results in the following output image:
Quite simple, eh? But not that interesting. We could easily do something like that with a pixel shader (although we would have to rasterize a full-screen quad to do it).
We should try to do something that shows the power of the compute shader; something you couldn't do in a pixel shader before. How about drawing some primitives like lines and circles?
For drawing lines, let's use the Digital Differential Analyzer algorithm. It translates to HLSL very easily.
void Plot(int x, int y)
Output[uint2(x, y)] = float4(0, 0, 1, 1);
void DrawLine(float2 start, float2 end)
float dydx = (end.y - start.y) / (end.x - start.x);
float y = start.y;
for (int x = start.x; x <= end.x; x++)
y = y + dydx;
For drawing circles let's use the Midpoint Circle algorithm. For brevity I won't list it here now.
Then, in my Compute Shader main function, I simply add this code:
if (threadID.x == 1023 && threadID.y == 1023)
DrawLine(float2(0, 0), float2(1024, 1024));
DrawLine(float2(0, 1023), float2(1023, 0));
DrawCircle(512, 512, 250);
DrawCircle(0, 512, 250);
The if check is just done to prevent the lines and circles from being drawn for every thread. This code results in the following image:
I must admit it seems quite odd writing a shader that draws primitives. It's like some strange recursive loop. But it definitely helps to illustrate the features of the Compute Shader and how powerful it is.
You may download the source code to this example here:
My next goal is to setup a standard DX11 Swap Chain and use the Compute Shader to write directly to the backbuffer. Well that's all for now.
FYI: This is my 50th blog post! I never thought I would continue on this long. I think I should crack open a beer to celebrate.