Hacker News new | past | comments | ask | show | jobs | submit login
Which Interpreters Are Faster, AST or Bytecode? (stefan-marr.de)
114 points by signa11 on Oct 20, 2023 | hide | past | favorite | 27 comments



To clarify, when this group says AST interpreter they mean ones that also have a JIT attached, such as what you get with RPython or GraalVM.

If AST interpreters without a JIT perform better than reasonably-written bytecode interpreters via some technique or for some workload (other than just short-running scripts, which is fair), I'd love to learn about it!


You can probably get a pretty efficient AST-style interpreter if you spit out a pile of very specialized closures calling each other.

I saw that somewhere before, let me see if I can find it.

EDIT: https://blog.cloudflare.com/building-fast-interpreters-in-ru...


The JavaScriptCore interpreter retained the original KJS AST interpreter model for many years, during which it was faster than the Firefox bytecode of the era, which indicates quite clearly that performance is not an automatic outcome of a bytecode interpreter. Functionally your "very specialized closures" are simply virtual calls (a closure is just a function pointer and a context object which is a single indirection less than a vcall).

There are a bunch of reasons we did eventually give up on the AST interpreter, but one of the biggest was that by the end we were seeing more than of 15% of runtime being just the call+return overhead. But there are a bunch of other things that make bytecode better: the AST interpreter is inherently recursive so it wasn't uncommon to have code that would fail in JSC due to stack overflows because an expression like `function f() {a(b(c(df()))}` would fundamentally 5x the recursive calls per js call (this is just to give you an idea of how recursive things were, the example above strictly always overflows). The bytecode also made it much easier to perform some inter-expression optimision under the latency requirements of a js interpreter, as well as reducing the GC costs with JSC (JSC uses a conservative collector that does stack scanning).


I found this[0] around the same time I found the paper from TFA during a recent rabbit hole journey, very similar methinks.

Combined with a meta-compilation system I believe it would be a fairly efficient way to implement a tree-walking interpreter.

[0] https://blog.reverberate.org/2021/04/21/musttail-efficient-i...


So...FORTH?


This is not unheard of. IIRC, Microsoft did something similar with QuickBASIC 4. The surface language was QuickBASIC, but the underlying implementation was some threaded representation.


That's really interesting. I found a StackExchange answer on this

https://retrocomputing.stackexchange.com/a/24427


Please see Figure 2 in the blog post https://stefan-marr.de/2023/10/ast-vs-bytecode-interpreters/... or Sec. 5.2 in the paper https://stefan-marr.de/papers/oopsla-larose-et-al-ast-vs-byt...

We report on results for both, with and without just-in-time compilation. The specific focus for this work was pure interpreter performance in the context of metacompilation systems, but before compilation had a chance to kick in.

You are free to disagree with the specific design choices of the AST and bytecode interpreters for SOM, but we put quite some effort in to have an as fair as possible comparison.


You can more or less massage an AST walker into being a bytecode VM. Just place the function pointer in the node together with its fields and write all the nodes to an array in some likely execution order. You now got bytecode but with function pointers as opcodes.


Afaik the important difference between "ast walker" and "bytecode vm" is that the former is recursive, that is, it will more or less use the host call-stack as user call-stack, while a bytecode vm manages its own datastructures and is basically a while loop. But you can definitely go from a recursive interpreter to a bytecode vm by doing a CPS transform of your interpreter and defunctionalizing it. See eg https://dl.acm.org/doi/10.1145/888251.888254


It's called a flattened AST.


misleading title - both RPython and Truffle/Graal are not interpeters but JIT compilers instead - the connection with interpretation is that they offer (or claim to) automatic construction of a JIT compiler if the user writes an interpreter.


Please see Figure 2 in the blog post https://stefan-marr.de/2023/10/ast-vs-bytecode-interpreters/... or Sec. 5.2 in the paper https://stefan-marr.de/papers/oopsla-larose-et-al-ast-vs-byt...

We report on results for both, with and without just-in-time compilation. The specific focus for this work was pure interpreter performance in the context of metacompilation systems, but before compilation had a chance to kick in.

For both RPython and Truffle/Graal, it's possible to disable the JIT compilers and measure pure interpreter speed.


Thanks for the clarification!

So the "baseline" is Java - is that Java compiled or interpreted? And if the latter, is the non-JIT-ted Graal interpreter compiled (as Java) and interpreting the script, or is it interpreted itself?


> is that Java compiled or interpreted?

The figure for the JIT-compiled numbers uses a standard HotSpot JVM, with JIT compilation. The figures for the interpreter numbers uses a standard HotSpot JVM with the -Xint flag, so, only using the Java bytecode interpreter.

The TruffleSOM interpreter is AOT-compiled, so, it's a native binary, which is then interpreting the SOM code.


Not only that:

It uses some custom language (SOM), built the AST/bytecode interpreters for that on top of other systems (GraalVM and RPython) which prohibits some optimizations and may use some intermediate format (Graal IR).

And then compares results using a limited set of benchmarks (Are We Fast Yet).

Interesting, but all in all it's not saying much (imho).

As so often: devil is in the details.


> It uses some custom language (SOM), built the AST/bytecode interpreters

SOM is the language being implemented, it's not used to build the interpreters.


A research field specific language based off a subset of smalltalk, that has been used for research in to just this kind of work for over two decades. It's not like the author went and chose or created something random. They used what folks in their field use. https://som-st.github.io/


With the amount of work that's needed to even attempt such a study and get to a point that one can precisely define which variable one is actually measuring, there are indeed compromises needed.

But please feel free to replicate the study for your preferred language. I am happy to discuss more about why we made certain choices.



Ye well my head went spinning reading that. Does such a "compiler factory" exist?


Yes, this is what Truffle is all about (1). You use Truffle to write the AST interpreter for your language and then GraalVM uses it to compile a JIT compiler for you automatically, with the bonus of interop with other languages.

(1) https://www.graalvm.org/latest/graalvm-as-a-platform/languag...


Wasn't it so that as Graal jitter is written in Java, Truffle can easily call it by using ordinary Java api?

By the way there's an excellent series of articles about implementing an interpreter with Truffle by Adam Ruka: https://www.endoflineblog.com/graal-truffle-tutorial-part-12...


Good point, isn't this the partial compilation?


> misleading title

Definitely. The title of the linked paper is a lot more reasonable:

> AST vs. Bytecode: Interpreters in the Age of Meta-Compilation

it's a shame the author decided to go with a clickbait version for his blog entry.


Graal is a JIT compiler. Truffle is a framework/DSL for building self-optimizing AST interpreters. Truffle also provides a partial evaluator that works with Graal for the metacompilation part. However, any interpreter you build with Truffle will run fine on any JVM. It'll be pretty slow without Graal, but it's a functional AST interpreter all the same. You could even use Native Image to AOT compile it to native code and cut the JVM out if you wanted.


I think that many interpreters go further down the path of pretending to be hardware than what makes sense in most cases; allocating a chunk of memory and pretending it's the entire world, pure byte code etc.

You can get pretty decent performance for substantially less effort by reusing more features of the host language, enums or tail calling closures for ops, inlining values without bit twiddling etc.

I still feel there's a point in having two separate representations though, one for reasoning about the code at a higher level and one for execution; it also gives you a compilation phase as a consequence, which allows nailing decisions statically and avoiding run time performance hits.

Lisp makes an interesting implementation language, since it allows easily generating and compiling code straight to the host language, which adds another layer of interesting possibilities for optimization.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: