This is somewhat similar to the classic problem of Good, Cheap, and Fast where you can only have two of them at the same time.
I implemented Perlin Noise entirely on the GPU, meaning no textures are precomputed and no parameters are set on the GPU. The shader is simply loaded and ran. It is a very clean solution and it is incredibly simple to add to any project.
Unfortunately, it is slow. Mind you, it is still much faster than the CPU implementation, but it is slow compared to my original precomputed texture implementation.
I always run my tests using 12 octaves of 3D fBm Perlin Noise summations at a resolution of 1024x1024. It yields much more stable results than just 1 calculation of Perlin Noise. My original implementation ran at about 85 fps. At first, my new simple implementation was running at about 3 fps! Even though the documentation states that HLSL automatically marks all global variables as const, if I put the const keyword before my permutation array my frame-rate jumped up to 19 fps.
I added a gradient lookup table and changed the gradient calculation functions to just index into the table. However, this basically had no impact on the speed. I then reduced the permutation array from 512 to 256 and performed a modulus operation on any indexing into the array. This gave my about a 30% speed increase and got it up to about 25 fps.
I tried various other tweaks, and I was able to get it to go a bit faster, but it was always at the expense of the quality of the noise (wouldn't work with negative values, would look blocky, etc). The fastest I was able to get it and still maintain the high quality Perlin Noise was 25 fps.
I must say that I'm rather disappointed with these results. I had thought that constant memory indexing would be faster than texture lookups, however the texture lookup version was over 3 times faster than the memory indexing version. Perhaps I'm just missing something or I'm implementing the memory indexing incorrectly, but I don't know what I could possibly do to speed it up anymore AND keep the same quality.
For the time being, it looks like texture lookups are the way to go. I've decided to upload 2 versions of the noise code. The first version is a direct port of Ken Perlin's Java code to HLSL(19 fps). The second includes the tweaks to the gradients and permutation array (25 fps).
As I have said, the major advantage to this implementation is the simplicity. All you have to do is include the header in your HLSL code and you can call the noise() and fBm() functions. That's it! You don't have to do anything else. So if you just want to drag and drop some Perlin Noise into a shader, this is the best way to do it, if you don't care about speed.