Clean up xdraws and optimize glyph drawing with non-unit kerning values
I have another patch here for review that optimizes the performance of
glyph drawing, primarily when using non-unit kerning values, and fixes a
few other minor issues. It's dependent on the earlier patch from me that
stores unicode codepoints in a Rune type, typedef'd to uint_least32_t.
This patch is a pretty big change to xdraws so your scrutiny is
appreciated.
First, some performance numbers. I used Yu-Jie Lin termfps.sh shell
script to benchmark before and after, and you can find it in the
attachments. On my Kaveri A10 7850k machine, I get the following
results:
Before Patch
============
1) Font: "Liberation Mono:pixelsize=12:antialias=false:autohint=false"
cwscale: 1.0, chscale: 1.0
For 273x83 100 frames.
Elapsed time : 1.553
Frames/second: 64.352
Chars /second: 1,458,159
2) Font: "Inconsolata:pixelsize=14:antialias=true:autohint=true"
cwscale: 1.001, chscale: 1.001
For 239x73 100 frames.
Elapsed time : 159.286
Frames/second: 0.627
Chars /second: 10,953
After Patch
===========
3) Font: "Liberation Mono:pixelsize=12:antialias=false:autohint=false"
cwscale: 1.0, chscale: 1.0
For 273x83 100 frames.
Elapsed time : 1.544
Frames/second: 64.728
Chars /second: 1,466,690
4) Font: "Inconsolata:pixelsize=14:antialias=true:autohint=true"
cwscale: 1.001, chscale: 1.001
For 239x73 100 frames.
Elapsed time : 1.955
Frames/second: 51.146
Chars /second: 892,361
As you can see, while the improvements for fonts with unit-kerning is
marginal, there's a huge ~81x performance increase with the patch when
using kerning values other than 1.0.
So what does the patch do?
The `xdraws' function would render each glyph one at a time if non-unit
kerning values were configured, and this was the primary cause of the
slow down. Xft provides a handful of functions which allow you to render
multiple characters or glyphs at time, each with a unique <x,y> position,
so it was simply a matter of massaging the data into a format that would
allow us to use one of these functions.
I've split `xdraws' up into two functions. In the first pass with
`xmakeglyphfontspecs' it will iterate over all of the glyphs in a given
row and it will build up an array of corresponding XftGlyphFontSpec
records. Much of the old logic for resolving fonts for glyphs using Xft
and fontconfig went into this function.
The second pass is done with `xrenderglyphfontspecs' which contains the
old logic for determining colors, clearing the background, and finally
rendering the array of XftGlyphFontSpec records.
There's a couple of other things that have been improved by this patch.
For instance, the UTF-32 codepoints in the Line's were being re-encoded
back into UTF-8 strings to be passed to `xdraws' which in turn would then
decode back to UTF-32 to verify that the Font contained a matching glyph
for the code point. Next, the UTF-8 string was being passed to
`XftDrawStringUtf8' which internally mallocs a scratch buffer and decodes
back to UTF-32 and does the lookup of the glyphs all over again.
This patch gets rid of all of this redundant round-trip encoding and
decoding of characters to be rendered and only looks up the glyph index
once (per font) during the font resolution phase. So this is probably
what's responsible for the marginal improvements seen when kerning values
are kept to 1.0.
I imagine there are other performance improvements here too, not seen in
the above benchmarks, if the user has lots of non-ASCII code plane characters
on the screen, or several different fonts are being utilized during
screen redraw.
Anyway, if you see any problems, please let me know and I can fix them.