WebKit frame delivery

Part of my work on WebKit at Igalia is keeping an eye on performance, especially when it concerns embedded devices. In my experience, there tend to be two types of changes that cause the biggest performance gains – either complete rewrites (e.g. WebRender for Firefox) or tiny, low-hanging fruit that get overlooked because the big picture is so multi-faceted and complex (e.g. making unnecessary buffer copies, removing an errant sleep(5) call). I would say that it’s actually the latter bug that’s harder to diagnose and fix, even though the code produced at the end of it tends to be much smaller. What I’m writing about here falls firmly into the latter group.

For a good while now, Alejandro García has been producing internal performance reports for WPE WebKit. A group of us will gather and do some basic testing of WPE WebKit on the same hardware – a Raspberry Pi 3B+. This involves running things like MotionMark and going to popular sites like YouTube and Google Maps and noting how well they work. We do this periodically and it’s a great help in tracking down regressions or obvious, user-facing issues. Another part of it is monitoring things like memory and CPU usage during this testing. Alex noted that we have a lot of idle CPU time during benchmarks, and at the same time our benchmark results fall markedly behind Chromium. Some of this is expected, Chromium has a more advanced rendering architecture that better makes use of the GPU, but a well-written benchmark should certainly be close to saturating the CPU and we certainly have CPU time to spare to improve our results. Based on this idea, Alex crafted a patch that would pre-emptively render frames (this isn’t quite what it did, but for the sake of brevity, that’s how I’m going to describe it). This showed significant improvements in MotionMark results and proved at the very least that it was legitimately possible to improve our performance in this synthetic test without any large changes.

This patch couldn’t land as it was due to concerns with it subtly changing the behaviour of how requestAnimationFrame callbacks work, but the idea was sound and definitely pointed towards an area where there may be a relatively small change that could have a large effect. This laid the groundwork for us to dig deeper and debug what was really happening here. Being a fan of computer graphics and video games, frame delivery/cadence is quite a hot topic in that area. I had the idea that something was unusual, but the code that gets a frame to the screen in WebKit is spread across many classes (and multiple threads) and isn’t so easy to reason about. On the other hand, it isn’t so hard to write a tool to analyse it from the client side and this would also give us a handy comparison with other browsers too.

So I went ahead and wrote a tool that would help us analyse exactly how frames are delivered from the client side, including under load, and the results were illuminating. The tool visualises the time between frames, the time between requesting a frame and receiving the corresponding callback and the time passed between performance.now() called at the start of a callback and the timestamp received as the callback parameter. To simulate ‘load’, it uses performance.now() and waits until the given amount of time has elapsed since the start of the callback before returning control to the main loop. If you want to sustain a 60fps update, you have about 16ms to finish whatever you’re doing in your requestAnimationFrame callback. If you exceed that, you won’t be able to maintain 60fps. Here’s Firefox under a 20ms load:

Firefox frame-times under a 20ms rendering load

And here’s Chrome under the same parameters:

Firefox frame-times under a 20ms rendering load

Now here’s WPE WebKit, using the Wayland backend, without any patches:

WebKit/WPE Wayland under a 20ms rendering load

One of these graphs does not look like the others, right? We can immediately see that when a frame exceeds the 16.67ms/60fps rendering budget in WebKit/WPE under the Wayland backend, that it hard drops to 30fps. Other browsers don’t wait for a vsync to kick off rendering work and so are able to achieve frame-rates between 30fps and 60fps, when measured over multiple frames (all of these browsers are locked to the screen refresh, there is no screen tearing present). The other noticeable thing in these is that the green line is missing on the WebKit test – this shows that the timestamp delivered to the frame callback is exactly the same as performance.now(), where as the timestamps in both Chrome and Firefox appear to be the time of the last vsync. This makes sense from an animation point of view and would mean that animations that are timed using this timestamp would move at a rate that’s more consistent with the screen refresh when under load.

What I’m omitting from this write-up is the gradual development of this tool, the subtle and different behaviour of different WebKit backends and the huge amount of testing and instrumentation that was required to reach these conclusions. I also wrote another tool to visualise the WebKit render pipeline, which greatly aided in writing potential patches to fix these issues, but perhaps that’s the topic of another blog post for another day.

In summary, there are two identified bugs, or at least, major differences in behaviour with other browsers here, both of which affect both fluidity and synthetic test performance. I’m not too concerned about the latter, but that’s a hard sell to a potential client that’s pointing at concrete numbers that say WebKit is significantly worse than some competing option. The first bug is that if a frame goes over budget and we miss a screen refresh (a vsync signal), we wait for the next one before kicking off rendering again. This is what causes the hard drop from 60fps to 30fps. As it concerns Linux, this only affects the Wayland WPE backend because that’s the only backend that implements vsync signals fully, so this doesn’t affect GTK or other WPE backends. The second bug, which is less of a bug, as a reading of the spec (steps 9 and 11.10) would certainly indicate that WebKit is doing the correct thing here, is that the timestamp given to requestAnimationFrame callbacks is the current time and not the vsync time as it is in other browsers (which makes more sense for timing animations). I have patches for both of these issues and they’re tracked in bug 233312 and bug 238999.

With these two issues fixed, this is what the graph looks like.

Patched WebKit/WPE with a 20ms rendering load

And another nice consequence is that MotionMark 1.2 has gone from:

WebKit/WPE MotionMark 1.2 results

to:

Patched WebKit/WPE MotionMark 1.2 results

Much better 🙂

No ETA on these patches landing; perhaps I’ve drawn some incorrect conclusions, or done something in a way that won’t work long term, or is wrong in some fashion that I’m not aware of yet. Also, this will most affect users of WPE/Wayland, so don’t get too excited if you’re using GNOME Web or similar. Fingers crossed though! A huge thanks to Alejandro García who worked with me on this and did an awful lot of debugging and testing (as well as the original patch that inspired this work).

OffscreenCanvas update

Hold up, a blog post before a year’s up? I’d best slow down, don’t want to over-strain myself 🙂 So, a year ago, OffscreenCanvas was starting to become usable but was missing some key features, such as asynchronous updates and text-related functions. I’m pleased to say that, at least for Linux, it’s been complete for quite a while now! It’s still going to be a while, I think, before this is a truly usable feature in every browser. Gecko support is still forthcoming, support for non-Linux WebKit is still off by default and I find it can be a little unstable in Chrome… But the potential is huge, and there are now double the number of independent, mostly-complete implementations that prove it’s a workable concept.

Something I find I’m guilty of, and I think that a lot of systems programmers tend to be guilty of, is working on a feature but not using that feature. With that in mind, I’ve been spending some time in the last couple of weeks to try and bring together demos and information on the various features that the WebKit team at Igalia has been working on. With that in mind, I’ve written a little OffscreenCanvas demo. It should work in any browser, but is a bit pointless if you don’t have OffscreenCanvas, so maybe spin up Chrome or a canary build of Epiphany.

OffscreenCanvas fractal renderer demo, running in GNOME Web Canary

Those of us old-skool computer types probably remember running fractal renderers back on their old home computers, whatever they may have been (PC for me, but I’ve seen similar demos on Amigas, C64s, Amstrad CPCs, etc.) They would take minutes to render a whole screen. Of course, with today’s computing power, they are much faster to render, but they still aren’t cheap by any stretch of the imagination. We’re talking 100s of millions of operations to render a full-HD frame. Running on the CPU on a single thread, this is still something that isn’t really real-time, at least implemented naively in JavaScript. This makes it a nice demonstration of what OffscreenCanvas, and really, Worker threads allow you to do without too much fuss.

The demo, for which you can look at my awful code, splits that rendering into 64 tiles and gives each tile to the first available Worker in a pool of rendering threads (different parts of the fractal are much more expensive to render than others, so it makes sense to use a work queue, rather than just shoot them all off distributed evenly amongst however many Workers you’re using). Toggle one of the animation options (palette cycling looks nice) and you’ll get a frame-rate counter in the top-right, where you can see the impact on performance that adding Workers can have. In Chrome, I can hit 60fps on this 40-core Xeon machine, rendering at 1080p. Just using a single worker, I barely reach 1fps (my frame-rates aren’t quite as good in WebKit, I expect because of some extra copying – there are some low-hanging fruit around OffscreenCanvas/ImageBitmap and serialisation when it comes to optimisation). If you don’t have an OffscreenCanvas-capable browser (or a monster PC), I’ve recorded a little demonstration too.

The important thing in this demo is not so much that we can render fractals fast (this is probably much, much faster to do using WebGL and shaders), but how easy it is to massively speed up a naive implementation with relatively little thought. Google Maps is great, but even on this machine I can get it to occasionally chug and hitch – OffscreenCanvas would allow this to be entirely fluid with no hitches. This becomes even more important on less powerful machines. It’s a neat technology and one I’m pleased to have had the opportunity to work on. I look forward to seeing it used in the wild in the future.

OffscreenCanvas, jobs, life

Hoo boy, it’s been a long time since I last blogged… About 2 and a half years! So, what’s been happening in that time? This will be a long one, so if you’re only interested in a part of it (and who could blame you), I’ve titled each section.

Leaving Impossible

Well, unfortunately my work with Impossible ended, as we essentially ran out of funding. That’s really a shame, we worked on some really cool, open-source stuff, and we’ve definitely seen similar innovations in the field since we stopped working on it. We took a short break (during which we also, unsuccessfully, searched for further funding), after which Rob started working on a cool, related project of his own that you should check out, and I, being a bit less brave, starting seeking out a new job. I did consider becoming a full-time musician, but business wasn’t picking up as quickly as I’d hoped it might in that down-time, and with hindsight, I’m glad I didn’t (Covid-19 and all).

I interviewed with a few places, which was certainly an eye-opening experience. The last ‘real’ job interview I did was for Mozilla in 2011, which consisted mainly of talking with engineers that worked there, and working through a few whiteboard problems. Being a young, eager coder at the time, this didn’t really phase me back then. Turns out either the questions have evolved or I’m just not quite as sharp as I used to be in that very particular environment. The one interview I had that involved whiteboard coding was a very mixed bag. It seemed a mix of two types of questions; those that are easy to answer (but unless you’re in the habit of writing very quickly on a whiteboard, slow to write down) and those that were pretty impossible to answer without specific preparation. Perhaps this was the fault of recruiters, but you might hope that interviews would be catered somewhat to the person you’re interviewing, or the work they might actually be doing, neither of which seemed to be the case? Unsurprisingly, I didn’t get past that interview, but in retrospect I’m also glad I didn’t. Igalia’s interview process was much more humane, and involved mostly discussions about actual work I’ve done, hypothetical situations and ethics. They were very long discussions, mind, but I’m very glad that they were happy to hire me, and that I didn’t entertain different possibilities. If you aren’t already familiar with Igalia, I’d highly recommend having a read about them/us. I’ve been there a year now, and the feeling is quite similar to when I first joined Mozilla, but I believe with Igalia’s structure, this is likely to stay a happier and safer environment. Not that I mean to knock Mozilla, especially now, but anyone that has worked there will likely admit that along with the giddy highs, there are also some unfortunate lows.

Igalia

I joined Igalia as part of the team that works on WebKit, and that’s what I’ve been doing since. It almost makes perfect sense in a way. Surprisingly, although I’ve spent overwhelmingly more time on Gecko, I did actually work with WebKit first while at OpenedHand, and for a short period at Intel. While celebrating my first commit to WebKit, I did actually discover it wasn’t my first commit at all, but I’d contributed a small embedding-related fix-up in 2008. So it’s nice to have come full-circle! My first work at Igalia was fixing up some patches that Žan Doberšek had prototyped to allow direct display of YUV video data via pixel shaders. Later on, I was also pleased to extend that work somewhat by fixing some vc3 driver bugs and GStreamer bugs, to allow for hardware decoding of YUV video on Raspberry Pi 3b (this, I believe, is all upstream at this point). WebKit Gtk and WPE WebKit may be the only Linux browser backends that leverage this pipeline, allowing for 1080p30 video playback on a Pi3b. There are other issues making this less useful than you might think, but either way, it’s a nice first achievement.

OffscreenCanvas

After that introduction, I was pointed at what could be fairly described as my main project, OffscreenCanvas. This was also a continuation of Žan’s work (he’s prolific!), though there has been significant original work since. This might be the part of this post that people find most interesting or relevant, but having not blogged in over 2 years, I can’t be blamed for waffling just a little. OffscreenCanvas is a relatively new web standard that allows the use of canvas API disconnected from the DOM, and within Workers. It also makes some provisions for asynchronously updated rendering, allowing canvas updates in Workers to bypass the main thread entirely and thus not be blocked by long-running processes on that thread. The most obvious use-case for this, and I think the most practical, is essentially non-blocking rendering of generated content. This is extremely handy for maps, for example. There are some other nice use-cases for this as well – you can, for example, show loading indicators that don’t stop animating while performing complex DOM manipulation, or procedurally generate textures for games, asynchronously. Any situation where you might want to do some long-running image processing without blocking the main thread (image editing also springs to mind).

Currently, the only complete implementation is within Blink. Gecko has a partial implementation that only supports WebGL contexts (and last time I tried, crashed the browser on creation…), but as far as I know, that’s it. I’ve been working on this, with encouragement and cooperation from Apple, on and off for the past year. In fact, as of August 12th, it’s even partially usable, though there is still a fair bit missing. I’ve been concentrating on the 2d context use-case, as I think it’s by far the most useful part of the standard. It’s at the point where it’s mostly usable, minus text rendering and minus some edge-case colour parsing. Asynchronous updates are also not yet supported, though I believe that’s fairly close for Linux. OffscreenCanvas is enabled with experimental features, for those that want to try it out.

My next goal, after asynchronous updates on Linux, is to enable WebGL context support. I believe these aren’t particularly tough goals, given where it is now, so hopefully they’ll happen by the end of the year. Text rendering is a much harder problem, but I hope that between us at Igalia and the excellent engineers at Apple, we can come up with a plan for it. The difficulty is that both styling and font loading/caching were written with the assumption that they’d run on just one thread, and that that thread would be the main thread. A very reasonable assumption in a pre-Worker and pre-many-core-CPU world of course, but increasingly less so now, and very awkward for this particular piece of work. Hopefully we’ll persevere though, this is a pretty cool technology, and I’d love to contribute to it being feasible to use widely, and lessen the gap between native and the web.

And that’s it from me. Lots of non-work related stuff has happened in the time since I last posted, but I’m keeping this post tech-related. If you want to hear more of my nonsense, I tend to post on Twitter a bit more often these days. See you in another couple of years 🙂

My Impossible Story

Keeping up my bi-yearly blogging cadence, I thought it might be fun to write about what I’ve been doing since I left Mozilla. It’s also a convenient time, as it coincides with our work being open-sourced and made public (and of course, developed in public, because otherwise what’s the point, right?) Somewhat ironically, I’ve been working on another machine-learning project, though I’m loathe to call it that, as it uses no neural networks so far, and most people I’ve encountered consider those to be synonymous. I did also go on a month’s holiday to the home of bluegrass music, but that’s a story for another post. I’m getting ahead of myself here.

Some time in March I met up with some old colleagues/friends and of course we all got to chatting about what we’re working on at the moment. As it happened, Rob had just started working at a company run by a friend of our shared former boss, Matthew Allum. What he was working on sounded like it would be a lot of fun, and I had to admit that I was a little jealous of the opportunity… But it so happened that they were looking to hire, and I was starting to get itchy feet, so I got to talk to Kwame Ferreira and one thing lead to another.

I started working for Impossible Labs in July, on an R&D project called ‘glimpse’. The remit for this work hasn’t always been entirely clear, but the pitch was that we’d be working on augmented reality technology to aid social interaction. There was also this video:

How could I resist?

What this has meant in real terms is that we’ve been researching and implementing a skeletal tracking system (think motion capture without any special markers/suits/equipment). We’ve studied Microsoft’s freely-available research on the skeletal tracking system for the Kinect, and filling in some of the gaps, implemented something that is probably very similar. We’ve not had much time yet, but it does work and you can download it and try it out now if you’re an adventurous Linux user. You’ll have to wait a bit longer if you’re less adventurous or you want to see it running on a phone.

I’ve worked mainly on implementing the tools and code to train and use the model we use to interpret body images and infer joint positions. My prior experience on the DeepSpeech team at Mozilla was invaluable to this. It gave me the prerequisite knowledge and vocabulary to be able to understand the various papers around the topic, and to realistically implement them. Funnily, I initially tried using TensorFlow for training, with the mind that it’d help us to easily train on GPUs. It turns out re-implementing it in native C was literally 1000x faster and allowed us to realistically complete training on a single (powerful) machine, in just a couple of days.

My take-away for this is that TensorFlow isn’t necessarily the tool for all machine-learning tasks, and also to make sure you analyse the graphs that it produces thoroughly and make sure you don’t have any obvious bottlenecks. A lot of TensorFlow nodes do not have GPU implementations, for example, and it’s very easy to absolutely kill performance by requiring frequent data transfers to happen between CPU and GPU. It’s also worth noting that a large graph has a huge amount of overhead that will be unrelated to the actual operations you’re trying to run. I’m no TensorFlow expert, but it’s definitely a particular tool for a particular job and it’s worth being careful. Experts can feel free to look at our repository history and tell me all the stupid mistakes I was making before we rewrote it 🙂

So what’s it like working at Impossible on a day-to-day basis? I think a picture says a thousand words, so here’s a picture of our studio:

Though I’ve taken this from the Impossible website, this is seriously what it looks like. There is actually a piano there, and it’s in tune and everything. There are guitars. We have a cat. There’s a tree. A kitchen. The roof is glass. As amazing as Mozilla (and many of the larger tech companies) offices are, this is really something else. I can’t overstate how refreshing an environment this is to be in, and how that impacts both your state of mind and your work. Corporations take note, I’ll take sunlight and life over snacks and a ball-pit any day of the week.

I miss my 3-day work-week sometimes. I do have less time for music than I had, and it’s a little harder to fit everything in. But what I’ve gained in exchange is a passion for my work again. This is code I’m pretty proud of, and that I think is interesting. I’m excited to see where it goes, and to get it into people’s hands. I’m hoping that other people will see what I see in it, if not now, sometime in the near future. Wish us luck!