Glad that the concept of compiler driver gets more exposures. "The `gcc` program itself is actually not a compiler...but a compiler driver" [1] is one of my favorite things to tell newbie compiler engineers.
[1]: Clang, on the other hand, is not the case (at least the modern one). The `clang` program is a compiler driver and compiler AND assembler for majority of the platforms.
> [1]: Clang, on the other hand, is not the case (at least the modern one). The `clang` program is a compiler driver and compiler AND assembler for majority of the platforms.
This framing is more likely to cause confusion than not, I think. `clang` is a compiler driver. `clang -cc1` is the compiler, and `clang -cc1as` is the assembler. They may all be the same executable, but the distinction between the tools is basically the first thing that happens. (Also note that the -cc1/-cc1as option has to be the very first option, it's not recognized in any other position).
i could easily google it but, what's LLVM? (me googling it would not translate automatically into me writing up an answer for everybody, however, LLVM seems to be the name of a project that includes clang)
LLVM is a series of projects, of which the core project is an assembly-like language (more specifically a compiler IR (or five)), optimization passes for said language, code generators to convert it to several major assembly languages (or their machine code representation), and a few associated tools that are similar to things provided by GNU binutils.
Other projects are included in LLVM. Clang is the C/C++ compiler. There is a C standard library, an OpenCL standard library, a C++ standard library (all inventively named lib<language>). There is also a library to support compiler builtins called compiler-rt, and a library to support OpenMP called openmp, and an implementation of C++ parallel executors in pstl. There is also a linker (lld) and a debugger (lldb). There is also a machine code optimizer called bolt, and yet another compiler framework called MLIR. Finally, there are several different Fortran compilers all called flang (don't ask).
LLVM is also notably used as the compiler backend for Rust, Swift and (I think) Zig and more. Rust is as fast as C because llvm includes all the optimizations used to make C programs fast, and rust gets to reuse all of those optimizations. Llvm can compile working binaries for all modern platforms (windows, Mac, Linux, FreeBSD, iOS, android, etc) and basically all modern architectures (x86, arm, riscv, wasm, etc). It’s also the official compiler toolchain used by Apple on all their platforms (and apple fund a lot of llvm development). I think Google also uses it for their C/C++ code and some Linux distributions use llvm instead of gcc as a compiler toolchain.
Llvm is everywhere. There’s a good chance llvm compiled some or all of the software you’re using to read this comment.
Um, no. `gcc` is a driver that runs cc1 (a C compiler from the GNU compiler collection), cc1plus (a C++ compiler from the GNU compiler collection), (g)as (an assembler from GNU binutils) and ld (a static linker from GNU binutils) and sometimes other front-ends for other languages, and sometimes other back-ends for other reasons, and sometimes some special middle-ends.
All `gcc` does is parse come command-line options and then invoke other programs to do the actual job of preprocessing, compiling, assembling, and linking.
That's right, It's all just stuff that happens when you hit F5 in your IDE. It's like not needing to know how an internal combustion engine works since all you have to do is tell the driver where you want to go.
I'm not doubting that it would be helpful to have a deeper knowledge of the tools you use. I'm just not sure there is a distinction for users between gcc in and clang in this regard.
In the gcc/gas distinction, the fact that gas is part of binutils, a completely separate project from gcc, means that there is another package you might have to update to use a new feature.
I do. There isn't a viable competitor to productivity tools like Office 365 (and no, LibreOffice and Thunderbird, while decent, do not cut it), ease of gaming (I'd rather play games natively, than fiddle with Wine), HiDPI support, PowerShell, Visual Studio (sue me: it's better than gcc + painful array of binutils + perf), etc.
Maybe it's Stockholm syndrome, but I have Arch installed too (on its own dedicated drive to boot), and I use it a lot less than I do Platform™.
Between Stockholm syndrome, sunk-cost fallacy, and the psychological inertia of arguing repeatedly on the Internet in the 2000's that it was more 1337 than the Macintosh, it's hard to tell what "willing" even means in that context. May as well try to figure out if people who drink alcohol do so "willingly".
> The compiler ingests one .c and outputs one .o. It has a low memory footprint. The linker on the other side, must use all the .o files at once to generate the executable. Keeping all these .o in memory would stress the system too much on big projects.
This is a really weak argument that does not make a lot of sense.
The major advantage of separate object files is avoiding recompilation of modules that haven’t changed.
In fact, compilers like Jonathan Blow’s Jai seem to get massive performance improvements by treating everything as a single compilation unit and avoiding writing a bunch of object files only to call the linker on all of them.
It's probably the main reason historically that compiling and linking are separate steps. Back in the day, you might not have had enough memory to even load all of the source at the same time. But yeah, that was decades ago.
Glad to see this covered -- I also had the experience of picking it up the hard way, over many years!
My own tips after writing a custom build system for C++
- the order of objects and -l flags to the linker matters! I remember being very surprised / frustrated by this.
- Sanitizers are built into compilers and trivial to use. Learn to use AddressSanitizer simply with -fsanitize=address! Ironically I think many people don't use it because their build system doesn't have good build variants (dbg, opt, asan), or they don't know how to configure the build system. Plain make generally isn't good enough.
- Some flags have to be passed to both the compiler and link steps, and others don't. I mostly figured this out by trial and error, and the error messages aren't great.
- You can compile and link in one driver invocation (c++ -o), or you can build each object separately and link (c++ -c).
I thought the former might be faster, and it seems simpler, but there doesn't seem to be any real advantage (edit: this post explains why -- it's literally subprocessing, which you can do better from a shell or build system). The latter is more common because it supports parallel and incremental builds.
Some options, I think -ftime-trace for Clang, which outputs JSON compile time traces, don't even respect the first style of building.
- Spending some time with a plain shell script and the compiler isn't a bad way to learn. Now I can finally read all those crappy long error commands from big build systems. The most common and useful flags are -I to add to the #include path and -D to define a preprocessor symbol.
---
edit after skimming the whole thing: This is really excellent, should be titled "Compilers: The Missing Manual".
They also seem to be missing the name "driver", which is important, even though I have encountered a page about that before. It seems to be in a separate "GCC Internals" doc:
Other notes: I should have been using the -v flag to the driver all along! That's a little embarrassing.
Also it's good to realize that g++ and gcc are both drivers, and the former sets the -I path to the location of the C++ stdlib and so forth.
Looks like lots of great examples of 'readelf' as well, which I'll go over again.
It is kind of crazy how people generally pick this up piecemeal over so many years ... A big problem in my mind is that it's usually wrapped in GNU make or CMake or IDE configs, which add their own line noise on top of the raw driver invocations. Which in turn have a ton of logic before the actual tools are invoked.
> - the order of objects and -l flags to the linker matters! I remember being very surprised / frustrated by this.
This took so long for me to actually find out... and then beat most build systems into submission because they have no ability to configure priority of linkage.
An excellent overview! I always dreaded learning this and your post seems to be a nice way to digest the content! This is probably going to get me to dive deep into some hardcore c/c++ work :)
One small copy correction: In the linker (4/5), you mentioned "Let's compile hello.c and peek inside hello.o.". But you are actually peeking the a.out.
I remember having used QuickC and Turbo C in the IDE...downloading DJGPP. Whole different world. And I thought the people that had been using GCC on UNIX systems for years were absolute wizards with what they knew.
Despite expressions of a lot of mimetic discontent online about how bad K&R is, the second edition (1988) remains an excellent example of technical writing. It's a good intro to (ANSI) C. It's not a good guide to e.g. build systems or other industry-standard tools, for the reasons described at the beginning of this article—but then again it doesn't pretend to be. (The name of the book is "The C Programming Language", after all.)
[1]: Clang, on the other hand, is not the case (at least the modern one). The `clang` program is a compiler driver and compiler AND assembler for majority of the platforms.