Sabbatical Over

Aww, my 8-week sabbatical is now over. I wish I had more time, but I feel I used it well and there are certainly lots of Firefox bugs I want to work on too, so perhaps it’s about that time now (also, it’s not that long till Christmas anyway!)

So, what did I do on my sabbatical?

As I mentioned in the previous post, I took the time off primarily to work on a game, and that’s pretty much what I did. Except, I ended up working on two games. After realising the scope for our first game was much larger than we’d reckoned for, we decided to work on a smaller puzzle game too. I had a prototype working in a day, then that same prototype rewritten because DOM is slow in another day, then it rewritten again in another day because it ends up, canvas isn’t particularly fast either. After that, it’s been polish and refinement; it still isn’t done, but it’s fun to play and there’s promise. We’re not sure what the long-term plan is for this, but I’d like to package it with a runtime and distribute it on the major mobile app-stores (it runs in every modern browser, IE included).

The first project ended up being a first-person, rogue-like, dungeon crawler. None of those genres are known for being particularly brief or trivial games, so I’m not sure what we expected, but yes, it’s a lot of work. In this time, we’ve gotten our idea of the game a bit more solid, designed some interaction, worked on various bits of art (texture-sets, rough monsters) and have an engine that lets you walk around an area, pick things up and features deferred, per-pixel lighting. It doesn’t run very well on your average phone at the moment, and it has layout bugs in WebKit/Blink based browsers. IE11’s WebGL also isn’t complete enough to render it as it is, though I expect I could get a basic version of it working there. I’ve put this on the back-burner slightly to focus on smaller projects that can be demoed and completed in a reasonable time-frame, but I hope to have the time to return to it intermittently and gradually bring it up to the point where it’s recognisable as a game.

You can read a short paragraph and see a screenshot of both of these games at our team website, or see a few more on our Twitter feed.

What did I learn on my sabbatical?

Well, despite what many people are pretty eager to say, the web really isn’t ready as a games platform. Or an app platform, in my humble opinion. You can get around the issues if you have a decent knowledge of how rendering engines are implemented and a reasonable grasp of debugging and profiling tools, but there are too many performance and layout bugs for it to be comfortable right now, considering the alternatives. While it isn’t ready, I can say that it’s going to be amazing when it is. You really can write an app that, with relatively little effort, will run everywhere. Between CSS media queries, viewport units and flexbox, you can finally, easily write a responsive layout that can be markedly different for desktop, tablet and phone, and CSS transitions and a little JavaScript give you great expressive power for UI animations. WebGL is good enough for writing most mobile games you see, if you can avoid jank caused by garbage collection and reflow. Technologies like CocoonJS makes this really easy to deploy too.

Given how positive that all sounds, why isn’t it ready? These are the top bugs I encountered while working on some games (from a mobile specific viewpoint):

WebGL cannot be relied upon

WebGL has finally hit Chrome for Android release version, and has been enabled in Firefox and Opera for Android for ages now. The aforementioned CocoonJS lets you use it on iOS too, even. Availability isn’t the problem. The problem is that it frequently crashes the browser, or you frequently lose context, for no good reason. Changing the orientation of your phone, or resizing the browser on desktop has often caused the browser to crash in my testing. I’ve had lost contexts when my app is the only page running, no DOM manipulation is happening, no textures are being created or destroyed and the phone isn’t visibly busy with anything else. You can handle it, but having to recreate everything when this happens is not a great user experience. This happens frequently enough to be noticeable, and annoying. This seems to vary a lot per phone, but is not something I’ve experienced with native development at this scale.

An aside, Chrome also has an odd bug that causes a security exception if you load an image (on the same domain), render it scaled into a canvas, then try to upload that canvas. This, unfortunately, means we can’t use WebGL on Chrome in our puzzle game.

Canvas performance isn’t great

Canvas ought to be enough for simple 2d games, and there are certainly lots of compelling demos about, but I find it’s near impossible to get 60fps, full-screen, full-resolution performance out of even quite simple cases, across browsers. Chrome has great canvas acceleration and Firefox has an accelerated canvas too (possibly Aurora+ only at the moment), and it does work, but not well enough that you can rely on it. My puzzle game uses canvas as a fallback renderer on mobile, when WebGL isn’t an option, but it has markedly worse performance.

Porting to Chrome is a pain

A bit controversial, and perhaps a pot/kettle situation coming from a Firefox developer, but it seems that if Chrome isn’t your primary target, you’re going to have fun porting to it later. I don’t want to get into specifics, but I’ve found that Chrome often lays out differently (and incorrectly, according to specification) when compared to Firefox and IE10+, especially when flexbox becomes involved. Its transform implementation is also quite buggy too, and often ignores set perspective. There’s also the small annoyance that some features that are unprefixed in other browsers are still prefixed in Chrome (animations, 3d transforms). I actually found Chrome to be more of a pain than IE. In modern IE (10+), things tend to either work, or not work. I had fewer situations where something purported to work, but was buggy or incorrectly implemented.

Another aside, touch input in Chrome for Android has unacceptable latency and there doesn’t seem to be any way of working around it. No such issue in Firefox.

Appcache is awful

Uh, seriously. Who thought it was a good idea that appcache should work entirely independently of the browser cache? Because it isn’t a good idea. Took me a while to figure out that I have to change my server settings so that the browser won’t cache images/documents independently of appcache, breaking appcache updates. I tend to think that the most obvious and useful way for something to work should be how it works by default, and this is really not the case here.

Aside, Firefox has a bug that means that any two pages that have the same appcache manifest will cause a browser crash when accessing the second page. This includes an installed version of an online page using the same manifest.

CSS transitions/animations leak implementation details

This is the most annoying one, and I’ll make sure to file bugs about this in Firefox at least. Because setting of style properties gets coalesced, animations often don’t run. Removing display:none from an element and setting a style class to run a transition on it won’t work unless you force a reflow in-between. Similarly, switching to one style class, then back again won’t cause the animation on the first style-class to re-run. This is the case at least in Firefox and Chrome, I’ve not tested in IE. I can’t believe that this behaviour is explicitly specified, and it’s certainly extremely unintuitive. There are plenty of articles that talk about working around this, I’m kind of amazed that we haven’t fixed this yet. I’m equally concerned about the bad habits that this encourages too.

DOM rendering is slow

One of the big strengths of HTML5 as an app platform is how expressive HTML/CSS are and how you can easily create user interfaces in it, visually tweak and debugging them. You would naturally want to use this in any app or game that you were developing for the web primarily. Except, at least for games, if you use the DOM for your UI, you are going to spend an awful lot of time profiling, tweaking and making seemingly irrelevant changes to your CSS to try and improve rendering speed. This is no good at all, in my opinion, as this is the big advantage that the web has over native development. If you’re using WebGL only, you may as well just develop a native app and port it to wherever you want it, because using WebGL doesn’t make cross-device testing any easier and it certainly introduces a performance penalty. On the other hand, if you have a simple game, or a UI-heavy game, the web makes that much easier to work on. The one exception to this seems to be IE, which has absolutely stellar rendering performance. Well done IE.

This has been my experience with making web apps. Although those problems exist, when things come together, the result is quite beautiful. My puzzle game, though there are still browser-specific bugs to work around and performance issues to fix, works across varying size and specification of phone, in every major, modern browser. It even allows you to install it in Firefox as a dedicated app, or add it to your homescreen in iOS and Chrome beta. Being able to point someone to a URL to play a game, with no further requirement, and no limitation of distribution or questionable agreements to adheer to is a real game-changer. I love that the web fosters creativity and empowers the individual, despite the best efforts of various powers that be. We have work to do, but the future’s bright.

Sabbatical

As of Friday night, I am now on a two month unpaid leave. There are a few reasons I want to do this. It’s getting towards the 3-year point at Mozilla, and that’s usually the sort of time I get itchy feet to try something new. I also think I may have been getting a bit close to burn-out, which is obviously no good. I love my job at Mozilla and I think they’ve spoiled me too much for me to easily work elsewhere even if that wasn’t the case, so that’s one reason to take an extended break.

I still think Mozilla is a great place to work, where there are always opportunities to learn, to expand your horizons and to meet new people. An unfortunate consequence of that, though, is that I think it’s also quite high-stress. Not the kind of obvious stress you get from tight deadlines and other external pressures, but a more subtle, internal stress that you get from constantly striving to keep up and be the best you can be. Mozilla’s big enough now that it’s not uncommon to see people leave, but it does seem that a disproportionate amount of them cite stress or needing time to deal with life issues as part of the reason for moving on. Maybe we need to get better at recognising that, or at encouraging people to take more personal time?

Another reason though, and the primary reason, is that I want to spend some serious time working on creating a game. Those who know me know that I’m quite an avid gamer, and I’ve always had an interest in games development (I even spoke about it at Guadec some years back). Pre-employment, a lot of my spare time was spent developing games. Mostly embarrassingly poor efforts when I view them now, but it’s something I used to be quite passionate about. At some point, I think I decided that I preferred app development to games development, and went down that route. Given that I haven’t really been doing app development since joining Mozilla, it feels like a good time to revisit games development. If you’re interested in hearing about that, you may want to follow this Twitter account. We’ve started already, and I like to think that what we have planned, though very highly influenced by existing games, provides some fun, original twists. Let’s see how this goes 🙂

Writing and deploying a small Firefox OS application

For the last week I’ve been using a Geeksphone Keon as my only phone. There’s been no cheating here, I don’t have a backup Android phone and I’ve not taken to carrying around a tablet everywhere I go (though its use has increased at home slightly…) On the whole, the experience has been positive. Considering how entrenched I was in Android applications and Google services, it’s been surprisingly easy to make the switch. I would recommend anyone getting the Geeksphones to build their own OS images though, the shipped images are pretty poor.

Among the many things I missed (Spotify is number 1 in that list btw), I could have done with a countdown timer. Contrary to what the interfaces of most Android timer apps would have you believe, it’s not rocket-science to write a usable timer, so I figured this would be a decent entry-point into writing mobile web applications. For the most part, this would just be your average web-page, but I did want it to feel ‘native’, so I started looking at the new building blocks site that documents the FirefoxOS shared resources. I had elaborate plans for tabs and headers and such, but turns out, all I really needed was the button style. The site doesn’t make hugely clear that you’ll actually need to check out the shared resources yourself, which can be found on GitHub.

Writing the app was easy, except perhaps for getting things to align vertically (for which I used the nested div/”display: table-cell; vertical-alignment: middle;” trick), but it was a bit harder when I wanted to use some of the new APIs. In particular, I wanted the timer to continue to work when the app is closed, and I wanted it to alert you only when you aren’t looking at it. This required use of the Alarm API, the Notifications API and the Page Visibility API.

The page visibility API was pretty self-explanatory, and I had no issues using it. I use this to know when the app is put into the background (which, handily, always happens before closing it. I think). When the page gets hidden, I use the Alarm API to set an alarm for when the current timer is due to elapse to wake up the application. I found this particularly hard to use as the documentation is very poor (though it turns out the code you need is quite short). Finally, I use the Notifications API to spawn a notification if the app isn’t visible when the timer elapses. Notifications were reasonably easy to use, but I’ve yet to figure out how to map clicking on a notification to raising my application – I don’t really know what I’m doing wrong here, any help is appreciated! Update: Thanks to Thanos Lefteris in the comments below, this now works – activating the notification will bring you back to the app.

The last hurdle was deploying to an actual device, as opposed to the simulator. Apparently the simulator has a deploy-to-device feature, but this wasn’t appearing for me and it would mean having to fire up my Linux VM (I have my reasons) anyway, as there are currently no Windows drivers for the Geeksphone devices available. I obviously don’t want to submit this to the Firefox marketplace yet, as I’ve barely tested it. I have my own VPS, so ideally I could just upload the app to a directory, add a meta tag in the header and try it out on the device, but unfortunately it isn’t as easy as that.

Getting it to work well as a web-page is a good first step, and to do that you’ll want to add a meta viewport tag. Getting the app to install itself from that page was easy to do, but difficult to find out about. I think the process for this is harder than it needs to be and quite poorly documented, but basically, you want this in your app:

And you want all paths in your manifest and appcache manifest to be absolute (you can assume the host, but you can’t have paths relative to the directory the files are in). This last part makes deployment very awkward, assuming you don’t want to have all of your app assets in the root directory of your server and you don’t want to setup vhosts for every app. You also need to make sure your server has the webapp mimetype setup. Mozilla has a great online app validation tool that can help you debug problems in this process.

Timer app screenshot

And we’re done! (Ctrl+Shift+M to toggle responsive design mode in Firefox)

Visiting the page will offer to install the app for you on a device that supports app installation (i.e. a Firefox OS device). Not bad for a night’s work! Feel free to laugh at my n00b source and tell me how terrible it is in the comments 🙂

Tips for smooth scrolling web pages (EdgeConf follow-up)

I’m starting to type this up as EdgeConf draws to a close. I spoke on the performance panel, with Shane O’Sullivan, Rowan Beentje and Pavel Feldman, moderated by Matt Delaney, and tried to bring a platform perspective to the affair. I found the panel very interesting, and it reminded me how little I know about the high-level of web development. Similarly, I think it also highlighted how little consideration there usually is for the platform when developing for the web. On the whole, I think that’s a good thing (platform details shouldn’t be important, and they have a habit of changing), but a little platform knowledge can help in structuring things in a way that will be fast today, and as long as it isn’t too much of a departure from your design, it doesn’t hurt to think about it. At one point in the panel, I listed a few things that are particularly slow from a platform perspective today. While none of these were intractable problems, they may not be fixed in the near future and feedback indicated that they aren’t all common knowledge. So what follows are a few things to avoid, and a few things to do that will help make your pages scroll smoothly on both desktop and mobile. I’m going to try not to write lies, but I hope if I get anything slightly or totally wrong, that people will correct me in the comments and I can update the post accordingly 🙂

Avoid reflow

When I mentioned this at the conference, I prefaced it with a quick explanation of how rendering a web page works. It’s probably worth reiterating this. After network and such have happened and the DOM tree has been created, this tree gets translated into what we call the frame tree. This tree is similar to the DOM tree, but it’s structured in a way that better represents how the page will be drawn. This tree is then iterated over and the size and position of these frames are calculated. The act of calculating these positions and sizes is referred to as reflow. Once reflow is done, we translate the frame tree into a display list (other engines may skip this step, but it’s unimportant), then we draw the display list into layers. Where possible, we keep layers around and only redraw parts that have changed/newly become visible.

Really, reflow is actually quite fast, or at least it can be, but it often forces things to be redrawn (and drawing is often slow). Reflow happens when the size or position of things changes in such a way that dependent positions and sizes of elements need to be recalculated. Reflow usually isn’t something that will happen to the entire page at once, but incautious structuring of the page can result in this. There are quite a few things you can do to help avoid large reflows; set widths and heights to absolute values where possible, don’t reposition or resize things, don’t unnecessarily change the style of things. Obviously these things can’t always be avoided, but it’s worth thinking if there are other ways to achieve the result you want that don’t force reflow. If positions of things must be changed, consider using a CSS translate transform, for example – transforms don’t usually cause reflow.

If you absolutely have to do something that will trigger reflow, it’s important to be careful how you access properties in JavaScript. Reflow will be delayed as long as possible, so that if multiple things happen in quick succession that would cause reflow, only a single reflow actually needs to happen. If you access a property that relies on the frame tree being up to date though, this will force reflow. Practically, it’s worth trying to batch DOM changes and style changes, and to make sure that any property reads happen outside of these blocks. Interleaving reads and writes can end up forcing multiple reflows per page-draw, and the cost of reflow can add up quickly.

Avoid drawing

This sounds silly, but you should really only make the browser do as little drawing as is absolutely necessary. Most of the time, drawing will happen on reflow, when new content appears on the screen and when style changes. Some practical advice to avoid this would be to avoid making DOM changes near the root of the tree, avoid changing the size of things and avoid changing text (text drawing is especially slow). While repositioning doesn’t always force redrawing, you can ensure this by repositioning using CSS translate transforms instead of top/left/bottom/right style properties. Especially avoid causing redraws to happen while the user is scrolling. Browsers try their hardest to keep up the refresh rate while scrolling, but there are limits on memory bandwidth (especially on mobile), so every little helps.

Thinking of things that are slow to draw, radial gradients are very slow. This is really just a bug that we should fix, but if you must use CSS radial gradients, try not to change them, or put them in the background of elements that frequently change.

Avoid unnecessary layers

One of the reasons scrolling can be fast at all on mobile is that we reduce the page to a series of layers, and we keep redrawing on these layers down to a minimum. When we need to redraw the page, we just paste these layers that have already been drawn. While the GPU is pretty great at this, there are limits. Specifically, there is a limit to the amount of pixels that can be drawn on the screen in a certain time (fill-rate) – when you draw to the same pixel multiple times, this is called overdraw, and counts towards the fill-rate. Having lots of overlapping layers often causes lots of overdraw, and can cause a frame’s maximum fill-rate to be exceeded.

This is all well and good, but how does one avoid layers at a high level? It’s worth being vaguely aware of what causes stacking contexts to be created. While layers usually don’t exactly correspond to stacking contexts, trying to reduce stacking contexts will often end up reducing the number of resulting layers, and is a reasonable exercise. Even simpler, anything with position: fixed, background-attachment: fixed or any kind of CSS transformed element will likely end up with its own layer, and anything with its own layer will likely force a layer for anything below it and anything above it. So if it’s not necessary, avoid those if possible.

What can we do at the platform level to mitigate this? Firefox already culls areas of a layer that are made inaccessible by occluding layers (at least to some extent), but this won’t work if any of the layers end up with transforms, or aren’t opaque. We could be smarter with culling for opaque, transformed layers, and we could likely do a better job of determining when a layer is opaque. I’m pretty sure we could be smarter about the culling we already do too.

Avoid blending

Another thing that slows down drawing is blending. This is when the visual result of an operation relies on what’s already there. This requires the GPU (or CPU) to read what’s already there and perform a calculation on the result, which is of course slower than just writing directly to the buffer. Blending also doesn’t interact well with deferred rendering GPUs, which are popular on mobile.

This alone isn’t so bad, but combining it with text rendering is not so great. If you have text that isn’t on a static, opaque background, that text will be rendered twice (on desktop at least). First we render it on white, then on black, and we use those two buffers to maintain sub-pixel anti-aliasing as the background varies. This is much slower than normal, and also uses up more memory. On mobile, we store opaque layers in 16-bit colour, but translucent layers are stored in 32-bit colour, doubling the memory requirement of a non-opaque layer.

We could be smarter about this. At the very least, we could use multi-texturing and store non-opaque layers as a 16-bit colour + 8-bit alpha, cutting the memory requirement by a quarter and likely making it faster to draw. Even then, this will still be more expensive than just drawing an opaque layer, so when possible, make sure any text is on top of a static, opaque background when possible.

Avoid overflow scrolling

The way we make scrolling fast on mobile, and I believe the way it’s fast in other browsers too, is that we render a much larger area than is visible on the screen and we do that asynchronously to the user scrolling. This works as the relationship between time and size of drawing is not linear (on the whole, the more you draw, the cheaper it is per pixel). We only do this for the content document, however (not strictly true, I think there are situations where whole-page scrollable elements that aren’t the body can take advantage of this, but it’s best not to rely on that). This means that any element that isn’t the body that is scrollable can’t take advantage of this, and will redraw synchronously with scrolling. For small, simple elements, this doesn’t tend to be a problem, but if your entire page is in an iframe that covers most or all of the viewport, scrolling performance will likely suffer.

On desktop, currently, drawing is synchronous and we don’t buffer area around the page like on mobile, so this advice doesn’t apply there. But on mobile, do your best to avoid using iframes or having elements that have overflow that aren’t the body. If you’re using overflow to achieve a two-panel layout, or something like this, consider using position:fixed and margins instead. If both panels must scroll, consider making the largest panel the body and using overflow scrolling in the smaller one.

I hope we’ll do something clever to fix this sometime, it’s been at the back of my mind for quite a while, but I don’t think scrolling on sub-elements of the page can ever really be as good as the body without considerable memory cost.

Take advantage of the platform

This post sounds all doom and gloom, but I’m purposefully highlighting what we aren’t yet good at. There are a lot of things we are good at (or reasonable, at least), and having a fast page need not necessarily be viewed as lots of things to avoid, so much as lots of things to do.

Although computing power continues to increase, the trend now is to bolt on more cores and more hardware threads, and the speed increase of individual cores tends to be more modest. This affects how we improve performance at the application level. Performance increases, more often than not, are about being smarter about when we do work, and to do things concurrently, more than just finding faster algorithms and micro-optimisation.

This relates to the asynchronous scrolling mentioned above, where we do the same amount of work, but at a more opportune time, and in a way that better takes advantage of the resources available. There are other optimisations that are similar with regards to video decoding/drawing, CSS animations/transitions and WebGL buffer swapping. A frequently occurring question at EdgeConf was whether it would be sensible to add ‘hints’, or expose more internals to web developers so that they can instrument pages to provide the best performance. On the whole, hints are a bad idea, as they expose platform details that are liable to change or be obsoleted, but I think a lot of control is already given by current standards.

On a practical level, take advantage of CSS animations and transitions instead of doing JavaScript property animation, take advantage of requestAnimationFrame instead of setTimeout, and if you find you need even more control, why not drop down to raw GL via WebGL, or use Canvas?

I hope some of this is useful to someone. I’ll try to write similar posts if I find out more, or there are significant platform changes in the future. I deliberately haven’t mentioned profiling tools, as there are people far more qualified to write about them than I am. That said, there’s a wiki page about the built-in Firefox profiler, some nice documentation on Opera’s debugging tools and Chrome’s tools look really great too.