I don't know compilers, but this seems like a lot more steps than I would have expected. Is it actually an unusually large number of transformations, or is that just how compilers are done?
Some are single pass like Turbo Pascal. They directly generate machine code while parsing (no AST!). Niklaus Wirth (Pascal/Modula/Oberon guy) wrote a book following this approach [1].
Some have multiple passes like Chez Scheme. They have many simple passes, which they call a "nanopass". Andrew Keep has a great talk on this approach [2].
In practice, most compilers today are multi-pass, though probably not as many Chez. If we look at Rust, they go from AST -> HIR -> MIR -> LLVM IR -> machine code [3]. There are probably more things going on from LLVM to machine code, but I'm not knowledgeable enough to comment on it.
I think the trade-off here is clear: less passes -> shorter compile times, more passes -> faster code generated, more modular compiler. Martin Odersky (Scala guy) has a paper attempting to get the best of both [4].
The unmentioned history here is memory use. In ye olden days, machines had much less memory and required more passes, simply because there was no space for all the data. So one might see a pass for preprocessing, lexing, ast generation, optimizations, code generation, etc. Note some of this exist today, ie gcc integrates with gas and flags to dump the assembler.
I see lots of comments about how the multiple passes have to do with limited resources like memory. In MLton’s case, (it’s a whole program optimizing compiler) each trasnlation phase has nothing to do with resource contention. And in general, limited resources is typically handled by separate compilation, not compiler passes.
The more important design choices here have to do with
1) Typed assembly languages (rather than untyped assymbly languages)
2) Type directed translation (rather than syntax directed translation)
3) The ability to reason about the correctness of each step of the translation as the program is gradually lowered from SML to assembly.
For more on the typed assembly language work, check out the work of Morissett and Harper that came out of CMU (specifically the TIL / TAL parts of the ConCert project). The XML type system was developed a little bit before that by Harper and Lillibridge. The general idea of a language being a mathematical object with an elaboration and a semantics where you can reason about progress and preservation at each step is a result of the vision of Robin Milner (there’s a good overview here http://homepages.inf.ed.ac.uk/stg/Milner2012/R_Harper-html4-...)
It depends on the language: how far away is it from machine code? (among other things)
C compilers are very close to machine code, so it takes few passes to write a simple non-optimizing C compiler. I'm pretty sure that after parsing, the original C compilers were one pass. We talked about them a bit here [1]
ML is further from the machine model, so compiling it takes more work. For example, it has closures, while C doesn't, and that's something extra that you have to deal with, probably in multiple stages of the compiler.
A common theme in functional languages like ML and Lisp is "desugaring passes" or "lowering passes", which basically means turning a language construct into a more basic one, so that you can treat things more uniformly at later stages. The more language features you have, the more potential for desugaring/lowering. OCaml and Haskell in particular have a ton of features that can be treated like this.
There are many different approaches. Chezscheme does a gazillion steps using the nanopass framework, but is still fast enough to compile a metric shit-tonne of code without any noticeable delay.
Interesting -- I suppose it's analogous to CPU pipelining, where having a larger number of simple stages can outperform having a small number of complicated ones.