ReCreation Studios: 2009

Sunday, December 27, 2009

Procedural Cities in XNA

It's been quite a while since I've talked about any development I've been doing. To be honest I have been quite busy with work and flight lessons, so I don't have much time to work on my hobby projects.

Lately I have been working on a procedural city generator in C#/XNA. I'm basing my work on the Metropolis project, which was in turn based upon a research paper presented at SIGGRAPH 2001.

I began with the procedural road map generator. It takes a heightmap and generates a road network from it.

Zooming in you can begin to see the subdivisions between streets where buildings will be built.

Zooming in even further makes the building lots even clearer.

Next, I began work on the 3D building creation. Here you can see many simple buildings all using the same texturing.

I then added more texture variety.

I finally added in the terrain with road texturing.

Zoomed out view showing the vast number of buildings generated.

And the final image showing "Central Park" (this city was generated from a heightmap of Manhattan).

I want to eventually release the source code for this, but first I need to clean up the code some more. I also need to tweak the renderer to run faster. Right now it is just brute forcing it, and doesn't utilize any form of level of detail or frustum culling.

Saturday, October 10, 2009

SlimDX August 2009 Release

A new version of SlimDX is now available. This version wraps the latest DirectX SDK release (August 2009, thus the name). This means that things like DirectX 11 and Direct2D are now officially supported (previous releases were beta).

You can download the installer here:
http://code.google.com/p/slimdx/

You can also read a little more about this release at the GameDev.net forums:
http://www.gamedev.net/community/forums/topic.asp?topic_id=549927

I'm looking forward to trying it out. Unfortunately, I still don't have DX11 hardware, so I'll probably hold off on things like tessellation or the compute shader.

Saturday, October 3, 2009

Disappointment

Since I have never used the Geometry Shader, I've been reading tutorials on how to use it to generate silhouettes and stencil shadows via the adjacency information being passed through the index buffer.

From my posts in the past, it is pretty obvious that I am really excited about tessellation and I want to make use of it ASAP. So, a natural thing that came to my mind was combining the two together to tessellate a mesh and then generate silhouettes.

No can do. I was completely and utterly disappointed to find out that you can NOT use adjacency information alongside the tessellator.

I quote from the official DirectX 11 docs: "A geometry shader that expects primitives with adjacency (for example, 6 vertices per triangle) is not valid when tessellation is active (this results in undefined behavior, which the debug layer will complain about)."

Thursday, September 24, 2009

DirectX 11 GPUs have arrived!

The Radeon HD 5870 is now out in the wild and available for purchase. I'm still unsure if I want to buy it or wait for the 5850 coming out next month. The 5850 is $120 less than the 5870 and the specs aren't that much lower.

On the Nvidia side of things, it's not looking very good. The latest rumors are that they are having incredibly low yields from their chips and as a result have delayed the launch of their DX11 GPUs to the third quarter of 2010. I wonder how this generation will play out if Nvidia's cards don't show up until almost a year after ATI's.

Update:
Apparently the news of the Nvidia delay is just FUD. They have officially announced that the 300 series will still debut in December. Read about it here.

Sunday, September 13, 2009

Perlin Noise in DirectX 10 (Shader Model 4.0)

This is somewhat similar to the classic problem of Good, Cheap, and Fast where you can only have two of them at the same time.

I implemented Perlin Noise entirely on the GPU, meaning no textures are precomputed and no parameters are set on the GPU. The shader is simply loaded and ran. It is a very clean solution and it is incredibly simple to add to any project.

Unfortunately, it is slow. Mind you, it is still much faster than the CPU implementation, but it is slow compared to my original precomputed texture implementation.

I always run my tests using 12 octaves of 3D fBm Perlin Noise summations at a resolution of 1024x1024. It yields much more stable results than just 1 calculation of Perlin Noise. My original implementation ran at about 85 fps. At first, my new simple implementation was running at about 3 fps! Even though the documentation states that HLSL automatically marks all global variables as const, if I put the const keyword before my permutation array my frame-rate jumped up to 19 fps.

I added a gradient lookup table and changed the gradient calculation functions to just index into the table. However, this basically had no impact on the speed. I then reduced the permutation array from 512 to 256 and performed a modulus operation on any indexing into the array. This gave my about a 30% speed increase and got it up to about 25 fps.

I tried various other tweaks, and I was able to get it to go a bit faster, but it was always at the expense of the quality of the noise (wouldn't work with negative values, would look blocky, etc). The fastest I was able to get it and still maintain the high quality Perlin Noise was 25 fps.

I must say that I'm rather disappointed with these results. I had thought that constant memory indexing would be faster than texture lookups, however the texture lookup version was over 3 times faster than the memory indexing version. Perhaps I'm just missing something or I'm implementing the memory indexing incorrectly, but I don't know what I could possibly do to speed it up anymore AND keep the same quality.

For the time being, it looks like texture lookups are the way to go. I've decided to upload 2 versions of the noise code. The first version is a direct port of Ken Perlin's Java code to HLSL(19 fps). The second includes the tweaks to the gradients and permutation array (25 fps).

First Version
Tweaked Version

As I have said, the major advantage to this implementation is the simplicity. All you have to do is include the header in your HLSL code and you can call the noise() and fBm() functions. That's it! You don't have to do anything else. So if you just want to drag and drop some Perlin Noise into a shader, this is the best way to do it, if you don't care about speed.

Wednesday, September 9, 2009

Soon ... Very Soon

The August DirectX SDK was released today (shh, don't tell them it's September). This brings with it the first official release of DirectX 11!

You can download it in all of it's glory here:
August DirectX SDK

Note: In order to run the DX11 samples on Vista, you need a patch. Unfortunately, that patch is not yet available, but when it is it should be available here.

In very related news, it looks like the first DirectX 11 GPUs will be available this month. The Radeon HD 5850 will be $300, the Radeon HD 5870 will be $400, and the Radeon HD 5870x2 will be $600.
Read more here

Tuesday, September 8, 2009

Byte Order Mark - The Invisible Enemy

Alternate Title:
EF BB BF - The Three Bytes of Doom

Last night I decided to whip out a quick SlimDX / DirectX 10 project implementing Perlin Noise using Shader Model 4.0. In my Perlin Noise on the GPU article, I mentioned how much easier it would be to implement Perlin Noise using SM4.0+ vs SM3.0. I had done a quick port of Ken Perlin's Java example implementation in FX Composer a couple months back, so I thought I would be able to implement the stand-alone version in less than an hour.

I wanted to test it first in a pixel shader, so I made my own custom vertex as well as a class that would build a fullscreen quad vertex buffer. I took the Perlin Noise HLSL code and put it into a header file, and made a simple two-technique shader that included the header.

I fired up the code, but I quickly got an exception (E_FAIL: An undetermined error occurred (-2147467259)) at the point where I was trying to set the InputLayout of the vertex buffer to the shader signature. Not a very useful exception.

At first, I thought it might have been a problem with my custom vertex format. After looking it over and comparing against other custom vertex formats I've made in the past, I determined the issue wasn't there.

Next I looked at how I was creating the vertex buffer and index buffer for the fullscreen quad. That all appeared in order too.

After determining that the issue was not in my custom vertex format or my fullscreen quad, I slowly stepped through the code keeping a close eye on all of the variables. I didn't think there was a problem with my shader because no errors were being output from the compiler, and it was spitting out a valid Effect. However, when stepping through and watching the variables, I saw that even though it created an Effect and the IsValid property was true, it said the TechniqueCount was zero! In fact, all of the counts were saying zero. It was as if the compiler was seeing it as an empty file.

So, I next looked at my shader in detail. I thought maybe something funky was happening with the included header file, so I inlined all of the header code. I still got the exception. I thought it might be some random issue in one of my noise functions, so I changed them to all return 1.0. Exception. I triple checked all of my input and output structures. I looked over my technique description. I changed my vertex shader to be a simple pass-through and the pixel shader to just return white. Exception.

What the heck was going on? I had other shaders that were compiling just fine. So, just as a "stupid" test, I copied one of the my other working shaders and pasted all of the code into my fx file, overwriting everything in it. I still got the exception!

Now I knew something was really messed up somewhere. Here I had two fx files with the exact same HLSL code in them, but one was compiling while the other was not.

I opened both files up using the Binary Editor in Visual Studio to see byte by byte. The file that was not compiling had three extra bytes at the beginning - EF BB BF. I deleted these three bytes, and everything worked!

It turns out that that byte sequence is the Byte Order Mark. It is specifying that the file is UTF-8 encoded. Apparently this is the default for all files created in Visual Studio 2008. Unfortunately, the FX compiler in DirectX can't read UTF-8 files and just dies and returns an empty Effect.

I did a quick Google search after fixing the problem and I saw that several other people had the same problem and eventually came to the same solution. I really wish the compiler would return an error in a situation like this.

What I find very interesting is the fact that I have been programming shaders for over two and a half years, and I have never run into this problem before.

Hopefully this post helps someone else if they encounter this same problem.

Until next time...

Tuesday, August 25, 2009

Teaser

I'm not yet ready to fully talk about what I'm working on, but I wanted to give a little preview to whet everyone's appetite.

Yep! 3001 boxes in 1 PhysX scene and still maintaining 29.81 frames per second. The really awesome part is the fact that this is all in C#!

I'll talk more about it later.

Until next time...

Wednesday, July 29, 2009

Infinite Depth Buffer

I've been planning out how I'm going to rewrite my planet algorithm once DirectX 11 is out. I've decided to focus on problems I have now that will still be a problem in DX11. One such problem that I've always been having in the past is the depth buffer.

My planet is Earth-sized so in order to keep it visible as you fly away from it, I pushed the far clipping plane way out. Obviously this destroyed the precision of my depth buffer and I had big problems with Z-fighting (far off mountains were being drawn in front of closer ones).

Rant: Why the heck are most GPUs these days still stuck with a 24-bit depth buffer? The Xbox 360 and my GeForce 9800M GT both only support up to a 24-bit depth buffer. DX11 level GPUs will have 64-bit floating point (double) support in shaders, so why not a 64-bit depth buffer?

In the videos and screenshots I have posted in the past, I did two different things to try and fix my problem with the depth buffer. First, I had a "sliding" far clipping plane that would have a minimum value, but as you flew away from the planet, it would extend out in order to continually show the planet. My second solution was to just disable the depth buffer. Both of these solutions only worked because I was drawing only 1 planet and there were no other objects being rendered. Obviously I want to keep my depth buffer around, keep the high precision for any near objects, but continue to draw far off planets (not have them clipped by the far clipping plane).

In order to fully understand my problem, I read about how exactly the depth buffer works and how a position is transfomed and then clipped. I found this article very informative about the inner workings of the GPU in terms of the depth buffer. I did not change my depth buffer to be linear like he does though. The article helped me to understand the relationship between the Z component and W component of the transformed vertex position.

Between the vertex shader and the pixel shader, the Z component is divided by W in order to "normalize" the depth (range 0-1). If the Z value is greater than 1 then it is clipped. So, I needed to make it so that the normalized Z value never exceeded 1. This was a very simple thing to fix, once I understood it. In my vertex shader I check the Z value to see if it is greater than the far clipping plane value (which I pass into the shader). If it is greater, then I simply set the W component equal to the Z component. This means that the Z / W calculation thus becomes Z / Z = 1. Now I can have good depth buffer precision for things close to the camera, but I will continue to draw things even if they are an infinite (theoretically) distance away!

Obviously this solution isn't perfect and there are some "gotchas". If I am drawing a planet and a moon, and the moon is behind the planet, and I am flying away from the planet, AND the moon is being drawn after the planet in the C# code, then the moon will suddenly pop in front of the planet once the planet exceeds the far clipping plane. That means I'll have to have a manager of large objects in the "local system" to make sure they are drawn in back to front order. That should be really easy to implement.

Hopefully this all makes sense and it helps someone else struggling with the same problem.

Until next time...

Update: I would now strongly encourage people to use a Logarithmic Depth Buffer to solve all of your depth buffer precision issues. You can read about it here:

http://www.gamasutra.com/blogs/BranoKemen/20090812/2725/Logarithmic_Depth_Buffer.php

Wednesday, July 1, 2009

Perlin Noise in JavaScript

I apologize for not having any update for the month of June, but I was gone on vacation for a majority of it.

I saw tons of places on the road trip. Illinois, Indiana, Michigan, Ontario, New York, Vermont, New Hampshire, Maine, Massachusetts, New Jersey, Delaware, Maryland, Virginia, Pennsylvania, West Virginia, and Ohio (in that order)! I had never been to the New England area of the country, so it was great being able to see it all.

In terms of development, I haven't really done any since I wrote that Perlin Noise article. I've been tossing around the idea in my head to go try out SlimDX again. I know, I know, I just can't make up my mind on things, right? It would just be very convenient to be writing C# again, and I found the great PhysX.Net wrapper that should allow me to continue to use PhysX with my C# development. Plus SlimDX already is supporting DirectX 11, which is awesome! (Tessellation, here I come!)

By the way, I was curious about JavaScript so I developed a simple little webpage that implements Perlin Noise in JavaScript.

You can find it here:
http://re-creationstudios.com/noise/

I used a Canvas element to display the result, so it won't work in Internet Explorer.

Be sure to check out the source code since I implemented several different types of summation as well (fBm, turbulence, and ridged multifractal). Enjoy!

Sunday, May 31, 2009

Perlin Noise on the GPU

I've written a tutorial on how to implement Perlin Noise in both a pixel shader and a vertex shader. The tutorial is available over at Ziggyware.

http://www.ziggyware.com/readarticle.php?article_id=246

Check it out! I welcome any feedback.

UPDATE
http://www.ziggyware.com/news.php?readmore=1125
I found out I have won first place in the Ziggyware contest! I'm very happy that my first XNA tutorial was such a success. Perhaps I will write up more in the future.

UPDATE 2
It appears that Ziggyware no longer exists, so I have shared the article, screenshots, and sample code on my server. You can download it all here:
http://re-creationstudios.com/shared/PerlinNoiseGPU/

Friday, May 29, 2009

PhysX Planet

Creating planetary gravity with a force field was much easier than I was expecting. Especially after finding this thread on the official PhysX forum.
http://developer.nvidia.com/forums/index.php?showtopic=3201&pid=9249&st=0&#entry9249

After doing that, I got curious and I removed the sphere "planet" but I kept the gravity force field. As expected, all of the shapes grouped together to form their own planet.

After I kept watching it, the planet started pulsing back and forth and eventually tore itself apart in a vortex that alternated back and forth.

The debris vortex started stabilizing and turned into a spinning ring.

The ring kept expanding and getting bigger and bigger.

I added some more cubes to the center, which formed a cool looking planet with a ring.

The planet at the center eventually started breaking apart and forming a ring as well, but what was most interesting was that the inner ring was rotating in the opposite direction of the outer ring.

Thursday, May 28, 2009

PhysX and DirectX 10

For the past week or so, I've been working on getting PhysX working in DirectX 10. I ported over my Shape3D class from my previous blog post to C++ in order to draw the boxes and spheres (no cylinders because PhysX doesn't have a cylinder shape).

I set up a simple scene with a ground plane (not rendered), normal Earth gravity, two boxes and a sphere. The space bar spawns new boxes at the origin and the Alt key spawns new spheres.

In my first implementation, the application would slow down to 20fps when I added about 75 shapes. I tracked down this slowdown to the creation of the vertex input layout each frame. Once I moved that out of the render loop, I could then have 1000 shapes in the scene and still maintain 30fps. I'm still trying to figure out if I can speed up the rendering even more because without rendering, the PhysX engine was able to handle 2000 shapes and maintain 60fps. I doubt I can speed it up much more though.

Here are some screenshots. Pretty simple, but the debug info helps to illustrate the speeds.

I have decided to share the Visual Studio 2008 project and source code to hopefully provide a nice starting point for others. I'm not using an official license or anything, but I'll state that is it free for whatever use you desire, be it commercial, hobby, or education.

Download the zip file: here

Next, I'm going to work on using static triangle meshes in PhysX to have a displaced sphere making up a planet, as well as using force fields to simulate planetary gravity.

Sunday, April 19, 2009

Boxes, Cylinders, and Spheres (Oh My!)

After deciding to use PhysX in combination with DirectX 11 last week, I thought about how I would go about drawing simple 3D shapes. In the samples and lessons, PhysX uses GLUT with has some shape drawing functions built-in. While DX9 has similar functions, neither DX10 nor XNA do. I believe it is safe to assume that DX11 will not have them either. I can understand why they were removed. They don't really make sense anymore in a programmable pipeline system. What is the vertex format of the shape created?

While most people these days get around it by creating a mesh in a 3D modeling application and then importing it into their game, I prefer to have my simple 3D shapes constructed programatically. This allows me to easily change the detail of the mesh without worrying about exporting/importing an entire new model.

Instead of implementing it in DX10 and C++ first, I decided to implement it in XNA. This allowed me to focus on the algorithms themselves and not have to worry about any C++ issues getting in the way.

My initial implementation was a static class that had static methods in it that built the shapes and returned the vertex buffer and index buffer. This worked well, but I thought it still placed too much burden on the user to remember how many vertices they had just created and how many triangles they were going to draw.

In order to make things simpler, I changed it to be a normal, instantiated class that contained the static methods that created an instance of the class. This allowed me to store the vertex count, triangle count, as well as the buffers in instance properties in the class. I even threw in a Draw method to make it incredibly simple to use the generated shape.

Because this is such a basic, fundamental class that can benefit several people ( have seen various people requesting something like this), I have decided to share it with the public.

You may download the C# file from here:
http://re-creationstudios.com/shared/Shape3D.cs

You can also download the entire test project here:
http://re-creationstudios.com/shared/ShapeTest.zip

To use, simply create a shape using one of the static methods:
Shape3D sphere = Shape3D.CreateSphere(GraphicsDevice, 1, 12, 12);

To draw, in your Effect block, call the Draw method on the shape:
sphere.Draw(GraphicsDevice);

The Create methods are made to follow the same structure as the DX9 shape drawing functions. You can read more about those here:
http://msdn.microsoft.com/en-us/library/bb172976(VS.85).aspx

Sunday, April 12, 2009

Tactical RPG

I've been engrossed with learning about DX11 lately. I've looked at slide presentations, listened to audio recordings of speeches, and read various articles discussing the new features being introduced. I believe it's a great understatement to say that I'm excited about DX11. I've finally come to the conclusion that once I have a DX11 GPU, I'm going to rewrite my procedural planet code using DX11.

In the meantime, I've decided to take about 1 month to implement a prototype for a tactical RPG I've been thinking about. I started about 2 weeks ago, and I plan to finish it by the end of April. One of my main goals is to have it completely mouse driven so that it can be played on a tablet.

I believe it is coming along rather well. I have a really nice 2D camera set up that behaves similar to Google Maps. You can use the mouse to drag around the view and you can zoom in and out using the scroll wheel. I have basic movement and attacking implemented as well. I even have one of my graphic designer friends making up some art for me. Overall, it should shape up to be a pretty decent prototype. Maybe I will even release it to the public, source and all.

PhysX

New content for Banjo-Kazooie: Nuts & Bolts was released last week. As a result, I pulled out the game again and played it some more. I wasted several hours just messing around with the vehicle editor to make various contraptions. I started thinking about how much fun it was to just tinker around like that without really following any goals, and then I started wondering about how hard it would be to implement a similar system myself.

Almost 4 years ago, I had implemented a "domino simulator" using what was then known as the NovodeX physics engine. It is now known as PhysX, is owned by Nvidia, and has hardware acceleration. So, I downloaded the SDK and played around with some of the samples.

First, the good. I was simply astounded by how many samples were provided in the SDK. There are 37 samples and 89 "lessons", all with documentation. It is amazing. Plus, the hardware acceleration really helps speed the physics engine up. One of the samples was getting about 40fps in software mode and 130fps in hardware mode. That was on a 9800M GT.

The bad is more about C++ than PhysX. I decided to create a C++ project from scratch and then add all of the libraries and code necessary to get the first lesson from the SDK working. It was horrific. It took me 3 hours to get everything to compile and run. In the end, here is what was in my project:

5 include directories
3 static libraries
- added to both the project and VS itself
- had to add one directly to solution to get to compile
12 include files (separate from the include directories)
8 cpp files
1 dll

Remember, all of that was to run the FIRST lesson, which is just three shapes on a plane. In C#, it would have been 3-4 DLLs and one CS file. As I said, this is more of a complaint about C++ and not PhysX. I forgot how tedious it was to setup a project for the first time. I do plan to stick with PhysX though, because once I do port my procedural planet code over to C++/DX11, then I will want a nice physics engine to go along with it, and it might was well be the only one with hardware acceleration.

Until next time...

Wednesday, March 18, 2009

Craters

Another quick update. I have been working on adding craters to the procedural moon. I implemented a Voronoi diagram shader in HLSL and then I tweak it with quite a few different parameters to generate conical pits that are distorted slightly with fBm noise.

I tried for quite a while to get rims around the edges of the craters, but I couldn't get it to work. I tried using a colormap to alter the Voronoi results, but I was having issues with the it. I will continue to tinker around with it because I think having rims would add quite a bit.

Interesting fact: If I add even more fBm noise to the crater distortions, it forms pretty cool canyons:

Tuesday, March 3, 2009

That's No Moon ...

Quick update. First, I switched to more "moon-like" textures.

Then, I tweaked some of the parameters to my existing noise function.

Finally, I started messing around with sums of two different noises.

Here's a detail shot showing the "better" noise at the surface.

There's a lot more work to do!

Thursday, February 19, 2009

Fixed Lighting + Higher Res

As I mentioned in my previous post, I was getting some strange vertical lines appearing in my deferred lighting result. After I turned down the ambient light to make the lighting a bit more realistic, the lines became even more pronounced.

Casting that problem aside for a bit, I decided to increase the resolution because 800x600 just wasn't cutting it anymore. I went with a widescreen resolution because both my laptop and my desktop have widescreen screens. I settled on 1280x720 because 1920x1200 would just be overkill right now, in my mind.

The problem with increasing the resolution was that the lines got even worse! Now I was getting horizontal lines as well as vertical lines, so it looked like a big checkerboard mess. I spent several days trying to figure out what was going wrong. At first I thought it was a bad driver/GPU in my laptop. So, I went to test it on my desktop, but I found out that my power supply was dead. Luckily my brother let me remote into his PC and run the app. I got the exact same results, so I knew it wasn't my GPU. I then installed FX Composer to have a better debugging IDE. I soon discovered that I was using wrong texel offsets to sample neighbors in the world position texture. This removed the lines from FX Composer, but they were still appearing in XNA. I was messing around with my sampler filters when I finally fixed the problem by switching them from Point to Linear. While it does get rid of the lines, it comes at a cost. I am now getting about 18fps average. Obviously the change in resolution also figures into that as well.

I have some interesting new screenshots to share.

Sunday, February 8, 2009

Deferred Lighting

This weekend I implemented a lighting system like the one I talked about in my last two blog posts. I'm calling it deferred lighting because it doesn't do any lighting calculation until I have render targets for the scene. I have one render pass that has two targets: one containing the diffuse color of the scene, and the other containing the world position of each pixel. In a second render pass, I calculate the normal of each pixel by sampling it's neighboring pixel world positions. I then simply do a standard lighting calculation using the normal and the diffuse color of the scene.

It also has much better performance compared to the brute force 32 noise calculations method. At low altitudes I was getting 16fps with the noise method and 33fps with the deferred method. At high altitudes I was getting 12fps and 30fps, respectively. As you can see, I was getting at least double the framerate all the time.

Now for some pretty pictures. They are not much different from my previous lighting pictures, the important thing is that they are being rendered much faster now. I also fixed a slight bug I had in the previous lighting that made the light direction the same for every side of the planet (there was no dark side). There are some strange vertical lines that are appearing which you can see in some of the screenshots below. I'm not sure why they are there, but I will continue to investigate them.

In the last picture you can see the detailed designs that are being generated for the terrain itself. Just to show a difference between the lit vs diffuse renderings, here is the diffuse texture alone for the last picture.

Thursday, February 5, 2009

Per Pixel Normal Calculation

Sorry, still no actual code or pretty pictures!

I just wanted to write up a quick note related to my second topic in my previous post. I did some Googling to see if any other people have implemented a similar system and indeed some people have. In fact I found an article by Microsoft that describes exactly what I was talking about.

http://msdn.microsoft.com/en-us/library/cc308054(VS.85).aspx

In the article, they are creating procedural materials dynamically, so they have to calculate normals dynamically as well.

From the article:
"One solution is to run the shader multiple times and compute the difference in height at each sample point. If we calculated the height one pixel to the right of the currently rasterized pixel and one pixel above the currently rasterized pixel, we could compute tangent and bitangent vectors to the central pixel. Doing a cross product on these would give us the normal for that point."

What I found funny is how they start talking about the ddX and ddY functions in HLSL but in the end they still use the render target + second pass method.

"The solution that this sample uses by default is to render the perturbed heights of the objects in the scene into an off-screen render target. That render target is then read back in on another pass. For each pixel on the screen, its right and top neighbors are sampled. Tangent and bitangent vectors are created from the neighbors to the central pixel. A cross product between these will give the normal."

I now feel very confident about this method of doing things and I will proceed to implement lighting in this manner. I will probably branch off of my existing planet codebase so I can easily compare the differences between the brute-force noise calculation vs the "deferred" style.

Tuesday, February 3, 2009

Hello 2009!

I just realized that I never wrote an entry for January. It's the first month that I have not had an update since I started this dev blog. To be honest, I didn't have much to report. I haven't really written any code but I have been thinking about a lot.

At first I was thinking about physics. I thought it would be nice to actually have collision detection with my terrain and possibly throw balls around or maybe even drive a car. However there was a big problem with this. How do I detect collision with a mesh that is deformed entirely on the GPU? Obviously I would have to have some way of sending the physics data to the GPU, do the collision detection there, and then somehow pass the resultant data back to the CPU.

Getting the data to the GPU is the easy part, I think. If I only use bounding spheres for all of the objects, then I can simply pass one normal Color texture to the GPU containing the position of each sphere in the RGB and the radius of the sphere in the Alpha. It may even be possible to set a constant memory buffer (ie array) with the data, which would be even easier.

Once I have this data, I can run through each object in the vertex shader to see if it collides with the current vertex. The problem I ran into then is I don't know how to get the data back to the CPU. I would obviously want to write the data to a render target. Unfortunately, the collision data is in the vertex shader. Pixel shaders cannot index into memory in XNA/DX9/SM3.0. [In DX10/SM4.0 they can index into constant memory, in DX11/SM5.0 both the vertex and pixel shaders can read and write to dynamic resources.] I have no idea how I would pass the data from the vertex shader to the pixel shader.

That means I must somehow do the collision detection in the pixel shader. However, this means that I will be doing the checks for every object, for every pixel. That will be massive overkill. I couldn't come up with a good solution, so I pretty much gave up on physics for now. It should be a cinch in DirectX 11 and Shader Model 5.0!

The next thing I was thinking about was mainly efficiencies. Currently I am calculating the normal in the pixel shader by doing 32 noise calculations per pixel. This is quite a strain on the GPU. I was reading an article about deferred rendering and I had a thought. If I only output the height of each pixel to a render target, then I could have another pass that reads the neighboring pixels in the render target into order to calculate the normal. This means it would one pass that does 8 noise calculations per pixel and then a second pass that does 4 texture lookups per pixel. I imagine that would be a much faster way of doing things.

I have yet to actually implement anything though. So everything here is just speculation. Sorry for no pretty pictures. I will try to get something worth showing off sometime soon.