AOT Compiling JavaScript, Part 1

at 2022-04-30

So yeah. JavaScript. Your favorite language that's designed under 11 days and got patches all over its body, running in browsers and desktops and backends and scripts and everywhere with a node_modules folder as big as a supermassive black hole. Now we're building an AOT compiler for it.

But what's the point of building an AOT compiler?

Nice question. We don't know.


This project actually comes from the lab I'm in, where we were offered a contract from *some big phone company*. They'd like to use precompiled JavaScript in browsers to reduce loading  time, use JavaScript in phones for frontend without V8 and a browser, maybe also running JavaScript inside embedded devices or something like that. So they need us to compile JS into WebAssembly.

When I first heard of this project, I was like, "what did you just say?" There are literally tons of languages that's both better than JS and has AOT compiling ability. Just go to Dart or Java or Kotlin or C# or whatever and forget this JavaScript mess.

And they said, "We want to reuse the full ecosystem created by NPM."

Reasonable. What's next?

"We also want it to beat V8 in speed."

WHAT?


The result of this is a year-long project (maybe even longer) under the codename "JWST," for "JavaScript to WebAssembly Static Translator" and definitely nothing related to a telescope whose launch was postponed for 15 years. We're currently a little more than a half year inside it, so expect stuff to change.

V8 is not for AOT

Speaking of beating V8 in speed, the best thing to beat V8 is V8 itself, right?

Wrong, because V8 is definitely not for AOT usage. We learned this after fiddling with V8 for about 1.5 months.

Let's first ignore the fact that a compiled version of V8 takes around 17 MiB of space for x86 code. Even its runtime only takes up half the size, it's 8 MiB, which is wayyyyyyyyy too large to be downloaded for accelerating webpages.

Okay. Where do we start, V8 TurboFan compiler IR?

Does anyone want to compile this? Not me =w=

Surprise! There's heap constants and JITter-facing checks everywhere, which we're too lazy to tackle with (with only 2 persons). We'd rather not change this whole pile of mess into an AOT compiler that generates stack-oriented code. No luck here.

What else, interpreter bytecode?

In this stage we're no longer using V8 to beat V8 of course (the speedup bits are mostly in the details of TurboFan). The bytecode is definitely much more readble, but we didn't spend a lot of time there.

We found a new prey!

QuickJS to the rescue

When researching about AOT compiling JavaScript, we stumbled across multiple previous attempts. The most notable one is ChowJS.

ChowJS is a (unfortunately) closed source AOT compiler made for their game engine. They used QuickJS for frontend and built their own backend to do optimization passes and codegen. They do type guessing like Hopc (Serrano, 2018) did, which matches our previous researchs. Overall, their choices looks great and much more feasible than crushing V8 into WASM stuff. Prefect prey.

After exploring QuickJS for some time, we found it fits absolutely perfect into the project. Its bytecode is clean and built with portability in mind (woo-hoo, no more heap constants). Its types are simple without all the fancy optimization tricks used by V8. Its runtime functions are intuitive and easy to use. And it compiles to just over 800 KiB of WebAssembly (300 KiB gzipped, 250 KiB brotli) - even less if we ditch the codegen part and only leave the runtime (which is totally fine. Who uses eval anyway?). I can talk about this all day.

So yeah. QuickJS rocks. How do we AOT from that?

We designed JWST roughly based on stuff we learned from ChowJS and Hopc. It uses QuickJS as the frontend, has a custom-built variable type guesser, and compiles into LLVM IR (!) from QJS bytecode and type information. Further optimization passes (if any) are preformed on LLVM IR.

How JWST is designed to work

The whole thing seems pretty trivial to implement – every compiler does more or less the same thing. We just build a mimic stack and translate every QuickJS bytecode into their LLVM IR counterpart and call it a day, right?

Sort of? Every issue I met so far is something I haven't thought of, instead of something that is really challenging and requires researching. But I'm just working on the codegen part, not the typing part, which is harder and poses some interesting questions to solve.


I think that's enough for Part 1. Trying to add more stuff to it would probably make this blog inside the draft box forever like my other unfinished projects. In Part 2, I'd like to talk about the details inside this project, the interesting problems and bugs I tripped over on during this half-year journey, and stuff like that.

Stay tuned!

References

Serrano, M. (2018). JavaScript AOT compilation. ACM SIGPLAN Notices, 53(8), 50-63.