ReCreation Studios: September 2009

Thursday, September 24, 2009

DirectX 11 GPUs have arrived!

The Radeon HD 5870 is now out in the wild and available for purchase. I'm still unsure if I want to buy it or wait for the 5850 coming out next month. The 5850 is $120 less than the 5870 and the specs aren't that much lower.

On the Nvidia side of things, it's not looking very good. The latest rumors are that they are having incredibly low yields from their chips and as a result have delayed the launch of their DX11 GPUs to the third quarter of 2010. I wonder how this generation will play out if Nvidia's cards don't show up until almost a year after ATI's.

Update:
Apparently the news of the Nvidia delay is just FUD. They have officially announced that the 300 series will still debut in December. Read about it here.

Sunday, September 13, 2009

Perlin Noise in DirectX 10 (Shader Model 4.0)

This is somewhat similar to the classic problem of Good, Cheap, and Fast where you can only have two of them at the same time.

I implemented Perlin Noise entirely on the GPU, meaning no textures are precomputed and no parameters are set on the GPU. The shader is simply loaded and ran. It is a very clean solution and it is incredibly simple to add to any project.

Unfortunately, it is slow. Mind you, it is still much faster than the CPU implementation, but it is slow compared to my original precomputed texture implementation.

I always run my tests using 12 octaves of 3D fBm Perlin Noise summations at a resolution of 1024x1024. It yields much more stable results than just 1 calculation of Perlin Noise. My original implementation ran at about 85 fps. At first, my new simple implementation was running at about 3 fps! Even though the documentation states that HLSL automatically marks all global variables as const, if I put the const keyword before my permutation array my frame-rate jumped up to 19 fps.

I added a gradient lookup table and changed the gradient calculation functions to just index into the table. However, this basically had no impact on the speed. I then reduced the permutation array from 512 to 256 and performed a modulus operation on any indexing into the array. This gave my about a 30% speed increase and got it up to about 25 fps.

I tried various other tweaks, and I was able to get it to go a bit faster, but it was always at the expense of the quality of the noise (wouldn't work with negative values, would look blocky, etc). The fastest I was able to get it and still maintain the high quality Perlin Noise was 25 fps.

I must say that I'm rather disappointed with these results. I had thought that constant memory indexing would be faster than texture lookups, however the texture lookup version was over 3 times faster than the memory indexing version. Perhaps I'm just missing something or I'm implementing the memory indexing incorrectly, but I don't know what I could possibly do to speed it up anymore AND keep the same quality.

For the time being, it looks like texture lookups are the way to go. I've decided to upload 2 versions of the noise code. The first version is a direct port of Ken Perlin's Java code to HLSL(19 fps). The second includes the tweaks to the gradients and permutation array (25 fps).

First Version
Tweaked Version

As I have said, the major advantage to this implementation is the simplicity. All you have to do is include the header in your HLSL code and you can call the noise() and fBm() functions. That's it! You don't have to do anything else. So if you just want to drag and drop some Perlin Noise into a shader, this is the best way to do it, if you don't care about speed.

Wednesday, September 9, 2009

Soon ... Very Soon

The August DirectX SDK was released today (shh, don't tell them it's September). This brings with it the first official release of DirectX 11!

You can download it in all of it's glory here:
August DirectX SDK

Note: In order to run the DX11 samples on Vista, you need a patch. Unfortunately, that patch is not yet available, but when it is it should be available here.

In very related news, it looks like the first DirectX 11 GPUs will be available this month. The Radeon HD 5850 will be $300, the Radeon HD 5870 will be $400, and the Radeon HD 5870x2 will be $600.
Read more here

Tuesday, September 8, 2009

Byte Order Mark - The Invisible Enemy

Alternate Title:
EF BB BF - The Three Bytes of Doom

Last night I decided to whip out a quick SlimDX / DirectX 10 project implementing Perlin Noise using Shader Model 4.0. In my Perlin Noise on the GPU article, I mentioned how much easier it would be to implement Perlin Noise using SM4.0+ vs SM3.0. I had done a quick port of Ken Perlin's Java example implementation in FX Composer a couple months back, so I thought I would be able to implement the stand-alone version in less than an hour.

I wanted to test it first in a pixel shader, so I made my own custom vertex as well as a class that would build a fullscreen quad vertex buffer. I took the Perlin Noise HLSL code and put it into a header file, and made a simple two-technique shader that included the header.

I fired up the code, but I quickly got an exception (E_FAIL: An undetermined error occurred (-2147467259)) at the point where I was trying to set the InputLayout of the vertex buffer to the shader signature. Not a very useful exception.

At first, I thought it might have been a problem with my custom vertex format. After looking it over and comparing against other custom vertex formats I've made in the past, I determined the issue wasn't there.

Next I looked at how I was creating the vertex buffer and index buffer for the fullscreen quad. That all appeared in order too.

After determining that the issue was not in my custom vertex format or my fullscreen quad, I slowly stepped through the code keeping a close eye on all of the variables. I didn't think there was a problem with my shader because no errors were being output from the compiler, and it was spitting out a valid Effect. However, when stepping through and watching the variables, I saw that even though it created an Effect and the IsValid property was true, it said the TechniqueCount was zero! In fact, all of the counts were saying zero. It was as if the compiler was seeing it as an empty file.

So, I next looked at my shader in detail. I thought maybe something funky was happening with the included header file, so I inlined all of the header code. I still got the exception. I thought it might be some random issue in one of my noise functions, so I changed them to all return 1.0. Exception. I triple checked all of my input and output structures. I looked over my technique description. I changed my vertex shader to be a simple pass-through and the pixel shader to just return white. Exception.

What the heck was going on? I had other shaders that were compiling just fine. So, just as a "stupid" test, I copied one of the my other working shaders and pasted all of the code into my fx file, overwriting everything in it. I still got the exception!

Now I knew something was really messed up somewhere. Here I had two fx files with the exact same HLSL code in them, but one was compiling while the other was not.

I opened both files up using the Binary Editor in Visual Studio to see byte by byte. The file that was not compiling had three extra bytes at the beginning - EF BB BF. I deleted these three bytes, and everything worked!

It turns out that that byte sequence is the Byte Order Mark. It is specifying that the file is UTF-8 encoded. Apparently this is the default for all files created in Visual Studio 2008. Unfortunately, the FX compiler in DirectX can't read UTF-8 files and just dies and returns an empty Effect.

I did a quick Google search after fixing the problem and I saw that several other people had the same problem and eventually came to the same solution. I really wish the compiler would return an error in a situation like this.

What I find very interesting is the fact that I have been programming shaders for over two and a half years, and I have never run into this problem before.

Hopefully this post helps someone else if they encounter this same problem.

Until next time...