Open Source Speech Recognition

I’m currently working on the Vaani project at Mozilla, and part of my work on that allows me to do some exploration around the topic of speech recognition and speech assistants. After looking at some of the commercial offerings available, I thought that if we were going to do some kind of add-on API, we’d be best off aping the Amazon Alexa skills JS API. Amazon Echo appears to be doing quite well and people have written a number of skills with their API. There isn’t really any alternative right now, but I actually happen to think their API is quite well thought out and concise, and maps well to the sort of data structures you need to do reliable speech recognition.

So skipping forward a bit, I decided to prototype with Node.js and some existing open source projects to implement an offline version of the Alexa skills JS API. Today it’s gotten to the point where it’s actually usable (for certain values of usable) and I’ve just spent the last 5 minutes asking it to tell me Knock-Knock jokes, so rather than waste any more time on that, I thought I’d write this about it instead. If you want to try it out, check out this repository and run npm install in the usual way. You’ll need pocketsphinx installed for that to succeed (install sphinxbase and pocketsphinx from github), and you’ll need espeak installed and some skills for it to do anything interesting, so check out the Alexa sample skills and sym-link the ‘samples‘ directory as a directory called ‘skills‘ in your ferris checkout directory. After that, just run the included example file with node and talk to it via your default recording device (hint: say ‘launch wise guy‘).

Hopefully someone else finds this useful – I’ll be using this as a base to prototype further voice experiments, and I’ll likely be extending the Alexa API further in non-standard ways. What was quite neat about all this was just how easy it all was. The Alexa API is extremely well documented, Node.js is also extremely well documented and just as easy to use, and there are tons of libraries (of varying quality…) to do what you need to do. The only real stumbling block was pocketsphinx’s lack of documentation (there’s no documentation at all for the Node bindings and the C API documentation is pretty sparse, to say the least), but thankfully other members of my team are much more familiar with this codebase than I am and I could lean on them for support.

I’m reasonably impressed with the state of lightweight open source voice recognition. This is easily good enough to be useful if you can limit the scope of what you need to recognise, and I find the Alexa API is a great way of doing that. I’d be interested to know how close the internal implementation is to how I’ve gone about it if anyone has that insider knowledge.

State of Embedding in Gecko

Following up from my last post, I’ve had some time to research and assess the current state of embedding Gecko. This post will serve as a (likely incomplete) assessment of where we are today, and what I think the sensible path forward would be. Please note that these are my personal opinions and not those of Mozilla. Mozilla are gracious enough to employ me, but I don’t yet get to decide on our direction 😉

The TLDR; there are no first-class Gecko embedding solutions as of writing.

EmbedLite (aka IPCLite)

EmbedLite is an interesting solution for embedding Gecko that relies on e10s (Electrolysis, Gecko’s out-of-process feature code-name) and OMTC (Off-Main-Thread Compositing). From what I can tell, the embedding app creates a new platform-specific compositor object that attaches to a window, and with e10s, a separate process is spawned to handle the brunt of the work (rendering the site, running JS, handling events, etc.). The existing widget API is exposed via IPC, which allows you to synthesise events, handle navigation, etc. This builds using the xulrunner application target, which unfortunately no longer exists. This project was last synced with Gecko on April 2nd 2015 (the day before my birthday!).

The most interesting thing about this project is how much code it reuses in the tree, and how little modification is required to support it (almost none – most of the changes are entirely reasonable, even outside of an embedding context). That we haven’t supported this effort seems insane to me, especially as it’s been shipping for a while as the basis for the browser in the (now defunct?) Jolla smartphone.

Building this was a pain, on Fedora 22 I was not able to get the desktop Qt build to compile, even after some effort, but I was able to compile the desktop Gtk build (trivial patches required). Unfortunately, there’s no support code provided for the Gtk version and I don’t think it’s worth the time me implementing that, given that this is essentially a dead project. A huge shame that we missed this opportunity, this would have been a good base for a lightweight, relatively easily maintained embedding solution. The quality of the work done on this seems quite high to me, after a brief examination.

Spidernode

Spidernode is a port of Node.js that uses Gecko’s ‘spidermonkey’ JavaScript engine instead of Chrome’s V8. Not really a Gecko embedding solution, but certainly something worth exploring as a way to enable more people to use Mozilla technology. Being a much smaller project, of much more limited scope, I had no issues building and testing this.

Node.js using spidermonkey ought to provide some interesting advantages over a V8-based Node. Namely, modern language features, asm.js (though I suppose this will soon be supplanted by WebAssembly) and speed. Spidernode is unfortunately unmaintained since early 2012, but I thought it would be interesting to do a simple performance test. Using the (very flawed) technique detailed here, I ran a few quick tests to compare an old copy of Node I had installed (~0.12), current stable Node (4.3.2) and this very old (~0.5) Spidermonkey-based Node. Spidermonkey-based Node was consistently over 3x faster than both old and current (which varied very little in performance). I don’t think you can really draw any conclusions than this, other than that it’s an avenue worth exploring.

Many new projects are prototyped (and indeed, fully developed) in Node.js these days; particularly Internet-Of-Things projects. If there’s the potential for these projects to run faster, unchanged, this seems like a worthy project to me. Even forgetting about the advantages of better language support. It’s sad to me that we’re experimenting with IoT projects here at Mozilla and so many of these experiments don’t promote our technology at all. This may be an irrational response, however.

GeckoView

GeckoView is the only currently maintained embedding solution for Gecko, and is Android-only. GeckoView is an Android project, split out of Firefox for Android and using the same interfaces with Gecko. It provides an embeddable widget that can be used instead of the system-provided WebView. This is not a first-class project from what I can tell, there are many bugs and many missing features, as its use outside of Firefox for Android is not considered a priority. Due to this dependency, however, one would assume that at least GeckoView will see updates for the foreseeable future.

I’d experimented with this in the past, specifically with this project that uses GeckoView with Cordova. I found then that the experience wasn’t great, due to the huge size of the GeckoView library and the numerous bugs, but this was a while ago and YMMV. Some of those bugs were down to GeckoView not using the shared APZC, a bug which has since been fixed, at least for Nightly builds. The situation may be better now than it was then.

The Future

This post is built on the premise that embedding Gecko is a worthwhile pursuit. Others may disagree about this. I’ll point to my previous post to list some of the numerous opportunities we missed, partly because we don’t have an embedding story, but I’m going to conjecture as to what some of our next missed opportunities might be.

IoT is generating a lot of buzz at the moment. I’m dubious that there’s much decent consumer use of IoT, at least that people will get excited about as opposed to property developers, but if I could predict trends, I’d have likely retired rich already. Let’s assume that consumer IoT will take off, beyond internet-connected thermostats (which are actually pretty great) and metered utility boxes (which I would quite like). These devices are mostly bespoke hardware running random bits and bobs, but an emerging trend seems to be Node.js usage. It might be important for Mozilla to provide an easily deployed out-of-the-box solution here. As our market share diminishes, so does our test-bed and contribution base for our (currently rather excellent) JavaScript engine. While we don’t have an issue here at the moment, if we find that a huge influx of diverse, resource-constrained devices starts running V8 and only V8, we may eventually find it hard to compete. It could easily be argued that it isn’t important for our solution to be based on our technology, but I would argue that if we have to start employing a considerable amount of people with no knowledge of our platform, our platform will suffer. By providing a licensed out-of-the-box solution, we could also enforce that any client-side interface remain network-accessible and cross-browser compatible.

A less tenuous example, let’s talk about VR. VR is also looking like it might finally break out into the mid/high-end consumer realm this year, with heavy investment from Facebook (via Oculus), Valve/HTC (SteamVR/Vive), Sony (Playstation VR), Microsoft (HoloLens), Samsung (GearVR) and others. Mozilla are rightly investing in WebVR, but I think the real end-goal for VR is an integrated device with no tether (certainly Microsoft and Samsung seem to agree with me here). So there may well be a new class of device on the horizon, with new kinds of browsers and ways of experiencing and integrating the web. Can we afford to not let people experiment with our technology here? I love Mozilla, but I have serious doubts that the next big thing in VR is going to come from us. That there’s no supported way of embedding Gecko worries me for future classes of device like this.

In-vehicle information/entertainment systems are possibly something that will become more of the norm, now that similar devices have become such commodity. Interestingly, the current big desktop and mobile players have very little presence here, and (mostly awful) bespoke solutions are rife. Again, can we afford to make our technology inaccessible to the people that are experimenting in this area? Is having just a good desktop browser enough? Can we really say that’s going to remain how people access the internet for the next 10 years? Probably, but I wouldn’t want to bet everything on that.

A plan

If we want an embedding solution, I think the best way to go about it is to start from Firefox for Android. Due to the way Android used to require its applications to interface with native code, Firefox for Android is already organised in such a way that it is basically an embedding API (thus GeckoView). From this point, I think we should make some of the interfaces slightly more generic and remove the JNI dependency from the Gecko-side of the code. Firefox for Android would be the main consumer of this API and would guarantee that it’s maintained. We should allow for it to be built on Linux, Mac and Windows and provide the absolute minimum harness necessary to allow for it to be tested. We would make no guarantees about API or ABI. Externally to the Gecko tree, I would suggest that we start, and that the community maintain, a CEF-compatible library, at least at the API level, that would be a Tier-3 project, much like Firefox OS now is. This, to me, seems like the minimal-effort and most useful way of allowing embeddable Gecko.

In addition, I think we should spend some effort in maintaining a fork of Node.js LTS that uses spidermonkey. If we can promise modern language features and better performance, I expect there’s a user-base that would be interested in this. If there isn’t, fair enough, but I don’t think current experiments have had enough backing to ascertain this.

I think that both of these projects are important, so that we can enable people outside of Mozilla to innovate using our technology, and by osmosis, become educated about our mission and hopefully spread our ideals. Other organisations will do their utmost to establish a monopoly in any new emerging market, and I think it’s a shame that we have such a powerful and comprehensive technology platform and we aren’t enabling other people to use it in more diverse situations.

This post is some insightful further reading on roughly the same topic.