Driving Compilers

mshockwave · on May 3, 2023

Glad that the concept of compiler driver gets more exposures. "The `gcc` program itself is actually not a compiler...but a compiler driver" [1] is one of my favorite things to tell newbie compiler engineers.

[1]: Clang, on the other hand, is not the case (at least the modern one). The `clang` program is a compiler driver and compiler AND assembler for majority of the platforms.

jcranmer · on May 3, 2023

> [1]: Clang, on the other hand, is not the case (at least the modern one). The `clang` program is a compiler driver and compiler AND assembler for majority of the platforms.

This framing is more likely to cause confusion than not, I think. `clang` is a compiler driver. `clang -cc1` is the compiler, and `clang -cc1as` is the assembler. They may all be the same executable, but the distinction between the tools is basically the first thing that happens. (Also note that the -cc1/-cc1as option has to be the very first option, it's not recognized in any other position).

fsckboy · on May 3, 2023

i could easily google it but, what's LLVM? (me googling it would not translate automatically into me writing up an answer for everybody, however, LLVM seems to be the name of a project that includes clang)

jcranmer · on May 3, 2023

LLVM is a series of projects, of which the core project is an assembly-like language (more specifically a compiler IR (or five)), optimization passes for said language, code generators to convert it to several major assembly languages (or their machine code representation), and a few associated tools that are similar to things provided by GNU binutils.

Other projects are included in LLVM. Clang is the C/C++ compiler. There is a C standard library, an OpenCL standard library, a C++ standard library (all inventively named lib<language>). There is also a library to support compiler builtins called compiler-rt, and a library to support OpenMP called openmp, and an implementation of C++ parallel executors in pstl. There is also a linker (lld) and a debugger (lldb). There is also a machine code optimizer called bolt, and yet another compiler framework called MLIR. Finally, there are several different Fortran compilers all called flang (don't ask).

josephg · on May 4, 2023

LLVM is also notably used as the compiler backend for Rust, Swift and (I think) Zig and more. Rust is as fast as C because llvm includes all the optimizations used to make C programs fast, and rust gets to reuse all of those optimizations. Llvm can compile working binaries for all modern platforms (windows, Mac, Linux, FreeBSD, iOS, android, etc) and basically all modern architectures (x86, arm, riscv, wasm, etc). It’s also the official compiler toolchain used by Apple on all their platforms (and apple fund a lot of llvm development). I think Google also uses it for their C/C++ code and some Linux distributions use llvm instead of gcc as a compiler toolchain.

Llvm is everywhere. There’s a good chance llvm compiled some or all of the software you’re using to read this comment.

astrange · on May 4, 2023

> Rust is as fast as C because llvm includes all the optimizations used to make C programs fast, and rust gets to reuse all of those optimizations.

I mean, that's not the reason Rust is as fast as C. It's because Rust semantically doesn't include mandatory slower things.

The compiler it uses is an implementation detail.

josephg · on May 4, 2023

Well, it’s both.

mshockwave · on May 3, 2023

From https://llvm.org : "a collection of modular and reusable compiler and toolchain technologies"

fooker · on May 3, 2023

clang is the gcc-compatible compiler driver. You can access the actual compiler frontend with clang -cc1.

zabzonk · on May 3, 2023

> he `clang` program is a compiler driver and compiler AND assembler

um, gcc does all those things.

bregma · on May 4, 2023

Um, no. `gcc` is a driver that runs cc1 (a C compiler from the GNU compiler collection), cc1plus (a C++ compiler from the GNU compiler collection), (g)as (an assembler from GNU binutils) and ld (a static linker from GNU binutils) and sometimes other front-ends for other languages, and sometimes other back-ends for other reasons, and sometimes some special middle-ends.

All `gcc` does is parse come command-line options and then invoke other programs to do the actual job of preprocessing, compiling, assembling, and linking.

mshockwave · on May 3, 2023

clang packs all three things into a single executable. GCC however will always call out to external executables for compiler (cc1) and assembler (as)

jcranmer · on May 3, 2023

gcc uses gas (from binutils) as its assembler, unlike clang.

citizen_friend · on May 4, 2023

As a user, why would I care if the actual compilation occurs:

- in the process

- in a fork of the process

- in a child process

The outcome to me is the same.

bregma · on May 4, 2023

That's right, It's all just stuff that happens when you hit F5 in your IDE. It's like not needing to know how an internal combustion engine works since all you have to do is tell the driver where you want to go.

izacus · on May 4, 2023

Because it helps when the process breaks and you don't end up being helpless and unable to fix the most important tool you use at your work.

Being ignorant is cute until its your job to be an engineer and fix problems.

citizen_friend · on May 5, 2023

I'm not doubting that it would be helpful to have a deeper knowledge of the tools you use. I'm just not sure there is a distinction for users between gcc in and clang in this regard.

jcranmer · on May 4, 2023

In the gcc/gas distinction, the fact that gas is part of binutils, a completely separate project from gcc, means that there is another package you might have to update to use a new feature.

fhuygfthdsafrsd · on May 4, 2023

The parent comment did say:

> is one of my favorite things to tell newbie compiler engineers.

sigjuice · on May 4, 2023

Exactly. This is all implementation trivia that is of absolutely no concern to me as a user.

genpfault · on May 3, 2023

Ah, yes, the Platform[1] platform, the one that uses DLLs and the PE executable format :)

[1]: https://github.com/fabiensanglard/dc/blob/1b9cbd081fcc530488...

intelVISA · on May 3, 2023

Ahh I think know that one actually. EEE Platform right? Candy crush main menu? Don't think people use it willingly these days...

delta_p_delta_x · on May 3, 2023

> Don't think people use it willingly these days

I do. There isn't a viable competitor to productivity tools like Office 365 (and no, LibreOffice and Thunderbird, while decent, do not cut it), ease of gaming (I'd rather play games natively, than fiddle with Wine), HiDPI support, PowerShell, Visual Studio (sue me: it's better than gcc + painful array of binutils + perf), etc.

Maybe it's Stockholm syndrome, but I have Arch installed too (on its own dedicated drive to boot), and I use it a lot less than I do Platform™.

intelVISA · on May 4, 2023

Well put. I guess in retrospect: I too use the Platform™ willingly, it is quite fun to RE :)

rcoveson · on May 3, 2023

Between Stockholm syndrome, sunk-cost fallacy, and the psychological inertia of arguing repeatedly on the Internet in the 2000's that it was more 1337 than the Macintosh, it's hard to tell what "willing" even means in that context. May as well try to figure out if people who drink alcohol do so "willingly".

heinrich5991 · on May 3, 2023

Three different pull requests were opened to fix this typo already, presumably in response to this Hacker News post.

ltadeut · on May 3, 2023

Nice series!

One comment though:

> The compiler ingests one .c and outputs one .o. It has a low memory footprint. The linker on the other side, must use all the .o files at once to generate the executable. Keeping all these .o in memory would stress the system too much on big projects.

This is a really weak argument that does not make a lot of sense.

The major advantage of separate object files is avoiding recompilation of modules that haven’t changed.

In fact, compilers like Jonathan Blow’s Jai seem to get massive performance improvements by treating everything as a single compilation unit and avoiding writing a bunch of object files only to call the linker on all of them.

masfuerte · on May 3, 2023

It's probably the main reason historically that compiling and linking are separate steps. Back in the day, you might not have had enough memory to even load all of the source at the same time. But yeah, that was decades ago.

astrange · on May 4, 2023

On a 32-bit system there's definitely still projects out there that can't be compiled all in the same address space.

gpderetta · on May 4, 2023

I very much worked on c++ projects that could not be easily linked on 64bit machines with too little (say, 64gb) of ram, without swapping heavily.

argulane · on May 3, 2023

That is a pretty good introduction to the steps that happen when you compile C code.

chubot · on May 4, 2023

Glad to see this covered -- I also had the experience of picking it up the hard way, over many years!

My own tips after writing a custom build system for C++

- the order of objects and -l flags to the linker matters! I remember being very surprised / frustrated by this.

- Sanitizers are built into compilers and trivial to use. Learn to use AddressSanitizer simply with -fsanitize=address! Ironically I think many people don't use it because their build system doesn't have good build variants (dbg, opt, asan), or they don't know how to configure the build system. Plain make generally isn't good enough.

- Some flags have to be passed to both the compiler and link steps, and others don't. I mostly figured this out by trial and error, and the error messages aren't great.

- You can compile and link in one driver invocation (c++ -o), or you can build each object separately and link (c++ -c).

I thought the former might be faster, and it seems simpler, but there doesn't seem to be any real advantage (edit: this post explains why -- it's literally subprocessing, which you can do better from a shell or build system). The latter is more common because it supports parallel and incremental builds.

Some options, I think -ftime-trace for Clang, which outputs JSON compile time traces, don't even respect the first style of building.

- Spending some time with a plain shell script and the compiler isn't a bad way to learn. Now I can finally read all those crappy long error commands from big build systems. The most common and useful flags are -I to add to the #include path and -D to define a preprocessor symbol.

---

edit after skimming the whole thing: This is really excellent, should be titled "Compilers: The Missing Manual".

I have actually looked at the manuals, e.g. https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Invoking-GCC.h... but they seem to be missing the high level conceptual overview.

They also seem to be missing the name "driver", which is important, even though I have encountered a page about that before. It seems to be in a separate "GCC Internals" doc:

https://gcc.gnu.org/onlinedocs/gcc-4.3.2/gccint/Driver.html#...

Other notes: I should have been using the -v flag to the driver all along! That's a little embarrassing.

Also it's good to realize that g++ and gcc are both drivers, and the former sets the -I path to the location of the C++ stdlib and so forth.

Looks like lots of great examples of 'readelf' as well, which I'll go over again.

It is kind of crazy how people generally pick this up piecemeal over so many years ... A big problem in my mind is that it's usually wrapped in GNU make or CMake or IDE configs, which add their own line noise on top of the raw driver invocations. Which in turn have a ton of logic before the actual tools are invoked.

izacus · on May 4, 2023

> - the order of objects and -l flags to the linker matters! I remember being very surprised / frustrated by this.

This took so long for me to actually find out... and then beat most build systems into submission because they have no ability to configure priority of linkage.

sawyna · on May 5, 2023

An excellent overview! I always dreaded learning this and your post seems to be a nice way to digest the content! This is probably going to get me to dive deep into some hardcore c/c++ work :)

One small copy correction: In the linker (4/5), you mentioned "Let's compile hello.c and peek inside hello.o.". But you are actually peeking the a.out.

fabiensanglard · on May 7, 2023

Thank you for pointing out the mistake. Fixed it.

Sirenos · on May 4, 2023

Great content! I wish I had this as a beginner. It would have saved me so much pain and time lost in the depths of google search results.

bluedino · on May 4, 2023

I remember having used QuickC and Turbo C in the IDE...downloading DJGPP. Whole different world. And I thought the people that had been using GCC on UNIX systems for years were absolute wizards with what they knew.

Brightwise · on May 4, 2023

Are these SVG graphics with the little boxes and arrows generated? Is this gnuplot?

fabiensanglard · on May 4, 2023

Proudly made with Inkscape and a mouse.

wandering-nomad · on May 4, 2023

Is the choice of C books mentioned in the beginning of this article still a good recommendation? I find all the books to be little outdated now

cxr · on May 4, 2023

Despite expressions of a lot of mimetic discontent online about how bad K&R is, the second edition (1988) remains an excellent example of technical writing. It's a good intro to (ANSI) C. It's not a good guide to e.g. build systems or other industry-standard tools, for the reasons described at the beginning of this article—but then again it doesn't pretend to be. (The name of the book is "The C Programming Language", after all.)