Sunday, September 25, 2011

Unity GPU Noise (Part 4)

Well, I'm a stubborn individual, so I continued working on the Android version of GPU Noise. I have actually been successful and made a little progress. Not without encountering more issues, of course.

First, I must say that GLSL hates conditional branches (including for loops). The main reason why my example scene wasn't running on Android was because I used an if-else tree in my shader (previously a switch statement) to select the noise type based upon a variable that was set on the Material by the C# side. While this works perfectly fine on Windows and DirectX, it's not the case for Android. I had to separate each noise type into its own shader/material and then have logic on the C# side to select the correct Material.

This is where things start to get strange. The basic Perlin noise function worked fine (as well as Voronoi), but none of my summations (like fBm) worked. They are simply for loops that call the Perlin function repeatedly. For testing, instead of having it take a variable for the number of times to loop, I hard-coded it to 4 octaves. This didn't work either. However, if I manually unrolled the for loop to have the code repeated 4 times, it finally worked.

Now, a little aside about my implementation. In order to speed up the noise functions, I precompute some values and store them in various textures which the noise functions then sample from. I wanted the noise functions to be accessible in both a vertex shader and pixel shader, so I used the tex2Dlod function to sample the textures. This is because tex2Dlod is the only permitted function for sampling from a vertex shader. (Automatic mip-mapping doesn't work in a vertex shader, so the developer must specify the mip-map level to use.)

Thinking that using tex2Dlod was somehow causing my for loop problems, I decided to look into a different implementation. I found a GLSL implementation of Perlin noise that didn't use any arrays or texture lookups (amazing!), thus making it highly cross-platform. I promptly ported it over to Cg in Unity. I was dismayed to discover that it yielded really terrible looking visual artifacts. The problem was even worse when using any summation. I quickly made a test scene in WebGL to test the shader in its original GLSL. Here, as you can see, it all worked perfectly. Going back to the Cg port, I went over line by line to see where it was failing. I first tried replacing the optimized "hacks" he used in place of modulus. That didn't fix it. Then, I looked at his inverse square root function. If I replaced his function with a call to rsqrt, the artifacts disappeared!

Armed with a new implementation, I tried running my test scene on Android again. And again, my summations failed. I happened to discover that the summations only worked when given certain octaves (for loop counter). This is the strangest thing of all. If I gave it 1, it worked. If I gave it 2 through 8, it failed. If I gave it 9 or greater, it worked.

I must say, I'm entirely perplexed. I have all of my noise functions working on an Android device now, but only with certain ranges on the summations. Why doesn't it work with 4 octaves when it works with 10? Does anyone have any insight into this strange issue?

1 comment:

Anonymous said...

Nice post! My guess is perhaps the compiler is doing semi-unrolling, where for instance if you have a loop iterated 10 times, the compiler will shorten it to a loop iterated five times, but doing twice the work (as a compromise between flow control and instruction count), and than that semi-unrolling fails if the number of iterations is too small (but if the number of iterations is 1 or 2, the compiler won't bother and will just unroll everything hence it works).

Compilers have gotten pretty smart, perhaps too much for their own good, and so all those issues arise which can be quite painful to diagnose because they apparently defy logic.