You can view the entire text.hlsl file you can include here.
Intro
For a long time, I've wanted a way to directly output text to a render target from HLSL. It would be similar to printf debugging, but output directly to the screen instead of to a log file somewhere.
There have already been some other efforts to add printf style support to shaders, but that usually involves outputting strings to a buffer and then reading that buffer back on the CPU.
While a common approach to rendering text is to use a texture of a font and then drawing textured quads for each glyph, I wanted to use a different approach. I wanted to avoid binding a texture entirely and use a vector font.
I had previous written debug shape drawing functions in HLSL to draw things like lines, circles, and boxes. For a vector font, I would only need lines, so I was prepared from that side of thing.
A perfect, simple, vector line-based font is the Hershey set.
Process Hershey Fonts via C#
I downloaded the full Hershey set of Roman characters and immediately began writing a simple C# script in LINQPad to process the file.
An important thing to note is that the original Hershey file is hard-wrapped at 72 characters. This small thing caused me some issues until I saw it mentioned here.
In order to verify that my C# parser was working correctly, I wrote code to write out each glyph to a Bitmap.
With all of the glyphs being properly processed and written, I then moved on to mapping the characters I wanted into the standard ASCII set. Luckily, there are already .hmp files which do exactly this for each of the different font styles in the Hershey set. I was interested in only the Roman Simplex mapping. With very little effort, I was soon outputting the appropriate glyphs.
Generate HLSL Code
With the glyphs defined, I wrote more C# code to output HLSL code to draw the glyphs. I started with a simple, brute-force solution where I output a large switch statement, with a case for each ASCII character. Each case would have a series of DrawLine() calls that would represent the glyph.
The function signature is this:
float2 DrawCharacter(RWTexture2D<float4> Output, uint CharacterCode, float2 Position, float4 Color)
For the character H, it would generate this:
DrawLineDDA(Output, float2(Position.x + -7, Position.y + -12), float2(Position.x + -7, Position.y + 9), Color);
DrawLineDDA(Output, float2(Position.x + 7, Position.y + -12), float2(Position.x + 7, Position.y + 9), Color);
DrawLineDDA(Output, float2(Position.x + -7 * Scale, Position.y + -2), float2(Position.x + 7, Position.y + -2), Color);
Position.x += 32;
break;
This generated over 1200 lines in a single HLSL function, but it worked ... mostly.
It would take about 2-3 seconds to compile the HLSL, which is long, but not ridiculous. The main problem was, it would take three minutes to generate the PSO the first time when calling SetPipelineState in D3D12 on a PC with an i9-14900K and an RTX 4090. That was completely unacceptable!
I did attempt swapping the attribute on the switch statement to be [branch], [call], etc, but it did nothing to change the compilation or PSO creation time. It was clear I needed a different approach.
Generate HLSL Array
I figured that since all I needed were the x and y positions for the vertices for the lines, then I could store each line in an array, and use the exact same DrawCharacter() function with no massive switch statement.
{
int ArrayIndex = CharacterCode - 32;
int LineCount = RomanSimplexFont[ArrayIndex][0];
int Width = RomanSimplexFont[ArrayIndex][1];
for (int i = 2; i < LineCount; i++)
{
float2 Start = float2(Position.x + RomanSimplexFont[ArrayIndex][i], Position.y + RomanSimplexFont[ArrayIndex][i+1]);
float2 End = float2(Position.x + RomanSimplexFont[ArrayIndex][i+2], Position.y + RomanSimplexFont[ArrayIndex][i+3]);
DrawLineDDA(Output, Start, End, Color);
}
return Position + float2(Width, 0);
}
int Array
I changed my C# to output a simple two-dimensional int array. Unfortunately, it obviously needed all of the glyphs to have the same size, which was the largest symbol, which is the @ symbol with 48 lines. 48 lines with 2 vertices per line and 2 integers per vertex equaled 192 integers for 95 ASCII characters.
For the character H, it would generate this:
-7, -12, -7, 9,
7, -12, 7, 9,
-7, -2, 7, -2,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
Note all of the zeros that are needs to pad it out to fill the 192 integers.
So, I generated the entire array, went to compile the shader ... and it failed. My constant buffer was larger than the limit of 65,536 bytes or 16,384 ints. I was using 95 * 192 = 18,240 ints.
However!
The compiler wasn't reporting that my array was the expected 72,960 bytes. Instead, it was reporting that it was 297,972 bytes, which was nearly 4 times larger than expected! I was cursed (yet again) by the HLSL packing rules for constant buffers, which have bitten me so many times in the past.
int4 Array
So, I updated the code generator to create an array of int4 which resulted in the expected 74,480 bytes.
However, this is still obviously over the 65,536 limit, so I needed to do something to get the Hershey data under the 64K.
int14_t4 Array
I attempted to use the int16_t4 data type (enabled via the DXC -enable-16bit-types flag), which I assumed would cut my array size in half.
But surprisingly, it stayed the exact same size. This is due to DXC/HLSL treating any constant buffer array as 32-bit, even if you explicitly use a 16-bit value. It will automatically pad the values. Note that if you create an array inside a function, it will be the correct size.
Bit-Packed int4 Array
Since all of the coordinates are less than abs(128), that means I can use an 8-bit signed number to represent each one. So, I could pack all 4 coordinates of a line into a single 32-bit int. That means I could store 4 lines in each int4!
I updated my C# code yet again to perform all of the proper bit-packing and generate the HLSL array. It only needed to 13 int4s per glyph due to the @ symbol having 48 lines. That is 12 int4s + 1 extra to store the line count, width, left padding, and right padding.
For the character H, it would generate this:
int4(3, 22, -11, 11),
int4(-101385975, 133433097, -100792322, 0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
int4(0,0,0,0),
This worked and I was able to render text with a 2-3 second shader compilation and a negligible PSO compilation!
ItoA
Outputting static text alone isn't as useful. I would obviously want to output numeric values that were constantly changing, such as frame-rates or pixel values. Unfortunately, HLSL has no string processing or string functions at all. I found a C custom implementation of itoa and I ported it over to HLSL easily.
FtoA
Similarly, I did the same for ftoa.
TEXT() Preprocessor Macro
As you can see, and as I have stated, there is no native string support in HLSL. You must use an array of uints. This because a big problem when you want to define a string literal.
const uint helloStr[] = {'H','e','l','l','o',',',' ','W','o','r','l','d','!',};
Gross!
So, I wrote my own custom TEXT() preprocessor macro. Before I compile the HLSL text, I use regular expressions in (hacky, ugly) C++ to find TEXT() and automatically expand the string to a uint array.
std::string originalShaderCodeString(shaderCodeText);
std::regex regexTextPattern("TEXT\\(.*\"\\)");
auto words_begin = std::sregex_iterator(originalShaderCodeString.begin(), originalShaderCodeString.end(), regexTextPattern);
auto words_end = std::sregex_iterator();
std::sregex_iterator i = words_begin;
while (i != words_end)
{
std::smatch match = *i;
// remove TEXT(" from the start and ") from the end
std::string actualString = match.str().erase(0, 6);
actualString = actualString.erase(actualString.size() - 2);
std::string output = "{";
int charIndex = 0;
for (char c : actualString)
{
output += "'";
output += c;
output += "',";
charIndex++;
}
while (charIndex < 255)
{
output += "0,";
charIndex++;
}
output += "0}";
size_t pos = 0;
while ((pos = originalShaderCodeString.find(match.str(), pos)) != std::string::npos)
{
originalShaderCodeString.replace(pos, match.length(), output);
pos += output.length(); // Move past the newly inserted substring
}
i = std::sregex_iterator(originalShaderCodeString.begin(), originalShaderCodeString.end(), regexTextPattern);
}
shaderCodeText = originalShaderCodeString.c_str();
So the above line becomes this:
const uint helloStr[] = TEXT("Hello, World!");
Much better!
There are still major limitations, mainly due to C limitations, not HLSL.
- You cannot re-initialize an array with a literal.
- helloStr = TEXT("Changed string!"); \\ fails to compile
- You cannot pass an array literal in as a function argument.
- DrawText(TEXT("My string literal arg!")); \\ fails to compile
strcat and strcpy
To provide some more C-esque support for strings, I did add my own custom implementations of strcat and strcpy.
void strcat(in uint stringA[256], in uint stringB[256], inout uint outString[256])
void strcpy(inout uint destination[256], in uint source[256])
Here is the HLSL source code for the above screenshot:
DrawText(gOutput, printStr, float2(0, 530), 0.75f);
uint tempStr[] = TEXT("itoa + cursor = ");
strcpy(printStr, tempStr);
float2 cursor = DrawText(gOutput, printStr, float2(0, 580), 2.0f, float4(1, 0, 0, 1));
itoa(1234567890, tempStr);
DrawText(gOutput, tempStr, cursor, 2.0f, float4(1, 0, 0, 1));
ftoa(3.14159f, printStr, 3);
const uint tempStr2[] = TEXT("ftoa + strcat = ");
uint concatStr[256];
strcat(tempStr2, printStr, concatStr);
DrawText(gOutput, concatStr, float2(0, 650), 3.0f, float4(1, 0, 1, 1));
Future Work
I now have a text renderer, fully implemented in HLSL, which can be used by any shader to output to any RWTexture, including the back-buffer. There are still many things I would like to add and improve.
- I would like to optimize the number lines needed to draw the @ symbol in order to reduce the array size
- It needs 48 lines and the next most expensive character & needs 33, so we could potentially save 3 int4s per character, which would be a reduction of 4,560 bytes.
- I already have in scale, but I would love to add in full transform/rotation.
- I have cursor advancement, but I have no concept of wrapping or line breaks, which would be nice for multiline text on the screen.
- The given string is completely drawn on the current single GPU thread. I would like to utilize the GPU threading more, perhaps through work graphs. Ideally it would be 1 glyph per thread, or even 1 line per thread.
- While I have seen decent performance on various GPUs, I have seen terrible performance on some, so I would like to optimize for all hardware, if possible. For the 3-line text image above I was seeing these (completely unscientific) frame times:
- GeForce RTX 4090 = 3 ms
- GeForce 970M = 25 ms
- Intel HD Graphics 620 = 19,000 ms (ouch!)