Accelerated layer-rendering, and learning by (some) success

Perhaps the title of my last blog post seemed a little negative, so I wanted to write on this topic again, on some of the things I’ve learnt since then, and some of the success I’ve had since then too. Failure was probably too strong a word, but better to be too negative than too positive about these things, especially when surrounded by the amazing talent there is at Mozilla…

I finished off previously by saying there are other, easier problems to solve, and I think I’m making some decent progress in those areas. I described before how shadow layers work, and how the chrome process can use GL-accelerated layer compositing, but the content process is always restricted to basic (unaccelerated) layers. This introduces the bottleneck of getting the image data from system memory to video memory. I was probably over-zealous in my previous approach. While asynchronous updates would be great, we could try to minimise those updates first. This is almost certainly something we should be doing anyway.

One of the ways we do this is by something that (confusingly) gets called ‘rotation’, in the source code. I mentioned scrolling before. To reiterate, we render to a larger buffer than is visible on the screen and when panning, we move that buffer and ask the content process to re-render the newly exposed area. We then update again when that’s finished. Hopefully, that happens quickly, but when it doesn’t, you may see some checker-boarding. When the content process re-renders, theoretically it only needs to re-render the newly exposed pixels, as it already has the rest of the page rendered. This could involve copying all the existing pixels upwards (assuming we’re scrolling downwards) and then rendering in the newly exposed area, but instead of doing this, we say ‘the origin of this buffer is now at these coordinates’, and we treat the buffer as if it wrapped around (thus ‘rotation’).

There are problems with this, however. For example, if you were to zoom into a rotated buffer whose rotation coordinates are visible on the screen, you may see a ‘seam’ at that position. Similarly, when re-using the existing pixels on the buffer, if the new scroll coordinates meant that the sample grid is no longer aligned with the previous sample grid, you may see odd artifacts on scaled images and text that was cut-off in a previous render. The following example demonstrates this:


The results of a misaligned sample grid

On the left is the original image (a checkerboard, purposefully chosen as it’s sort-of a worst case scenario), and on the right, the same image with a 1-pixel border added on the left and upper edges. They both have the same, bi-linear scale applied to them, and the border is then cropped on the right image. You can immediately see that the same image does not result, and putting them together draws extra attention to this. This is what happens when you try to combine the results of two sampling image operations that have misaligned sample grids.

The code makes some attempt at separating out situations where this will happen and marking them so that in those situations, rotation doesn’t occur and the entire buffer is re-rendered. I don’t know what assumptions you can make about cairo’s sampling, or indeed how we drive it to draw pages, but certainly this code is over-zealous with marking when resampling will occur. For example, we zoom pages to fit the width of the screen by default. And any zoom operation marks the surface to say it will be resampled. We also update the content process’s scroll co-ordinates every 20 pixels. So, for the overwhelmingly common case, we re-render the entire buffer every 20 pixels. On a dual-core (or more) machine, assuming your cores aren’t saturated, this doesn’t matter so much without hardware-acceleration, as the chrome process oughtn’t be affected by what’s happening in the content process, and when it finishes, it just does a simple page-flip anyway. Unfortunately, this isn’t the case in practice, I guess due to the memory bandwidth required to re-render such a large surface, and perhaps due to non-ideal scheduling (remember, these are guesses. I’ve been terribly lazy when it comes to testing these theories).

Even more unfortunately, this is a terrible hit for GL-accelerated layers, as we don’t do page-swapping, we do synchronous buffer uploads. Also, the default shadow layer size is 200%x300% of the visible area. So let’s say you have 1280×752 pixels visible (as is the case on a 1280×800 Honeycomb tablet), every 20 pixels you scroll you’re doing a synchronous, 9.4mb upload from system memory to ‘video memory’ (I put this in quotes, as I don’t want to go down the path of explaining shared memory architecture and how it ends up working on android. It would be long and I’d probably be wrong). Even worse, most android devices have a maximum texture size of 2048×2048, so we have to tile these textures – so you’re then splitting up these uploads, with texture-binds in-between, making it even slower.

Well, you might then say, “well at least in some cases you’ll still get the benefit of rotation, right?” Well, unfortunately, you’d be wrong. We disable buffer rotation entirely on shadow layers. So we have a number of problems here. I discovered this when I noticed how frequently we were doing whole-buffer updates, both on GL and software. The first thing I thought was to just disable the marking of possibly-resampling surfaces (you can do this by either not setting PAINT_WILL_RESAMPLE in BasicLayers.cpp, or ignoring it in ThebesLayerBuffer.cpp – you’ll notice that it checks MustRetainContent, which returns TRUE for shadow layers). This ought to get you the benefit of rotation, at the expense of some visible artifacing. The bug for enabling buffer rotation is here. But then I ran into this bug, which I fixed. This gets you buffer rotation being used more frequently with software rendering, but when using hardware acceleration, things now appear very broken. Doubly-so if you use tiles.

So next, I investigated why things were broken when using hardware acceleration. First was to alter desktop to use tiles. After doing this, and picking a small tile-size, I noticed that a lot of drawing was then broken. This ended up being this bug, which I fixed. Now more of the screen is visible, but rotation is still broken. This ended up being a two-fold problem. The first being that we don’t handle uploads to GL layers correctly when there’s rotation, and the second being we don’t handle rendering of rotated GL layers when we have tiles. I fix both of these in this bug.

So after hacking away the resampling check and fixing the various rendering bugs that rotation then exposes, you can see the benefit it would get you. Unfortunately, there’s still a lot of work to be done, and even when this works perfectly, it isn’t going to benefit all situations (we could still do with a fast-path for texture upload on android, and asynchronous updates or page-flipping). But on some sites (my own, for example, and my favourite test-site, engadget.com), the difference is pretty big. So four bugs fixed and a deeper knowledge of how layers are put together, I count this one as a success :)

]]>

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">