Hacker News new | past | comments | ask | show | jobs | submit login
C Style: My favorite C programming practices (2014) (github.com/mcinglis)
226 points by zerojames 8 months ago | hide | past | favorite | 134 comments



I was surprised to see this on the HN front page, after so many years. Thanks for sharing it!

Suffice to say: my opinions on this topic have shifted significantly. A decade+ more of programming-in-the-large, and I no longer pay much heed to written-in-prose style guides. Instead, I've found mechanistic "style" enforcement and close-to-live-feedback much more effective for maintaining code quality over time.

A subtext is that I wrote this during a period of work - solo programmer, small company - on a green-field power system microcontroller project; MODBUS comms, CSV data wrangling. I'd opted for C primarily for the appeal of having a codebase I could keep in my head (dependencies included!). There was much in-the-field development, debugging and redeployments, so it was really valuable to have a thin stack, and an easy build process.

So, other than one vendored third-party package, I had total control over that codebase's style. And so, I had the space to consider and evolve my C programming style, reflecting on what I considered was working best for that code.

My personal C code style has since shifted significantly, as well - much more towards older, more-conventional styles.

Still, opinionated, idiosyncratic documents like this - if nothing else - can serve as fun discussion prompts. I'm appreciating all the discussion here!


Update the text! I would love to read the diff.


> Write correct, readable, simple and maintainable software, and tune it when you're done, with benchmarks to identify the choke points

If speed is a primary concern, you can't tack it on at the end, it needs to be built in architecturally. Benchmarks applied after meeting goals of read/maintainability are only benchmarking the limits of that approach and focus.

They can't capture the results of trying and benchmarking several different fundamental approaches made at the outset in order to best choose the initial direction. In this case "optimisation" is almost happening first.

Sometimes the fastest approach may not be particularly maintainable, and that may be just fine if that component is not expected to require maintaining, eg, a pure C bare-metal in a bespoke and one-off embedded environment.


Well, yes. Architect for performance, try not to do anything "dumb", but save micro-optimizations for after performance measurement.


The problem with all of these rules of thumb is that they're vague to the point of being vacuously true. Of course we all agree that "premature optimization is the root of all evil" as Knuth once said, but the saying itself is basically a tautology: if something is "premature", that already means it's wrong to do it.

I'll be more impressed when I see specific advice about what kinds of "optimizations" are premature. Or, to address your reply specifically, what counts as "doing something dumb" vs. what is a "micro-optimization". And, the truth is, you can't really answer those questions without a specific project and programming language in mind.

But, what I do end up seeing across domains and programming languages is that people sacrifice efficiency (which is objective and measurable, even if "micro") for a vague idea of what they consider to be "readable" (today--ask them again in six months). What I'm specifically thinking of is people writing in programming languages with eager collection types that have `map`, `filter`, etc methods, and they'll chain four or five of them together because it's "more readable" than a for-loop. The difference in readability is absolutely negligible to any programmer, but they choose to make four extra heap-allocated, temporary, arrays/lists and iterate over the N elements four or five times instead of once because it looks slightly more elegant (and I agree that it does). Is it a "micro-optimization" to just opt for the for-loop so that I don't have to benchmark how shitty the performance is in the future when we're iterating over more elements than we thought we'd ever need to? Or is it not doing something dumb? To me, it seems ridiculous to intentionally choose a sub-optimal solution when the optimal one is just as easy to write and 99% (or more) as easy to read/understand.


Ok, a bit more detail then. :)

Architecting for performance means picking your data structures, data flow, and algorithms with some thought towards efficiency for the application you have in mind. Details will vary a lot depending on context. But as many folks have said, this sort of thing can't be done after the fact.

As for "doing something dumb", I've often seem fellow engineers do things like repeatedly insert into sorted data structures in a loop instead of just inserting into an unsorted structure and then sorting after the inserts. If you think about it for just a minute, it should be obvious why that's not smart (for most cases.) Stuff like that.

What do I mean by "micro-optimizations"? Taking a clearly written function and spending a lot of time making it as efficient _as_possible_ (possibly at the expense of clarity) without first doing some performance analysis to see if it matters.

Nobody's saying to pick suboptimal solutions at all.


> As for "doing something dumb", I've often seem fellow engineers do things like repeatedly insert into sorted data structures in a loop instead of just inserting into an unsorted structure and then sorting after the inserts. If you think about it for just a minute, it should be obvious why that's not smart (for most cases.) Stuff like that.

That's a great example that I've seen in the wild as well!

> Nobody's saying to pick suboptimal solutions at all.

No, I realize that. And most of my comment wasn't intended as some kind of direct disagreement to yours. It was mostly just some observations. One of which is that advice about writing efficient code is usually too vague to be useful, and the other is that people take the "don't optimize without measuring" advice to mean something ridiculous in the opposite extreme that reads more like "just write whatever garbage looks pretty to you because any forethought about what makes sense to the computer is premature optimization". I wasn't trying to say that's what you were advocating for, though.


I don't know if this embedded development still alive. I'm writing firmware for nRF BLE chip which is supposed to run from battery and their SDK uses operating system. Absolutely monstrous chips with enormous RAM and Flash. Makes zero sense to optimize for anything, as long as device sleeps well.


A little over 10 years ago I was doing some very resource-constrained embedded programming. We had been using custom chip with an 8051-compatible instruction set (plus some special purpose analogue circuitry) with a few hundred bytes of RAM. For a new project we used an ARM Cortex M0, plus some external circuitry for analogue parts.

The difference was ridiculous - we were actually porting a prototype algorithm from a powerful TI device with hardware floating point. It turned out viable to simply compile the same algorithm with software emulation of floating point - the Cortex M0 could keep up.

Having said all that though: the 8051 solution was so much physically smaller that the ARM just wouldn't have been viable in some products (this was more significant because having the analogue circuitry on-chip limited how small the feature size for the digital part of the silicon could be).

Obviously that was quite a while ago! But even at the time, I was amazed how much difference the simpler chip made actually made to the size of the solution. The ARM would have been a total deal breaker for that first project, it would just have been too big. I could certainly believe people are still programming for applications like that where a modern CPU doesn't get a look in.


Probably right in the broader sense, but there are still niches. Eg, for one: space deployments, where sufficiently hardened parts may lag decades behind SOTA and the environ can require a careful balance of energy/heat against run-time.


It's still alive, but pushed down the layers. The OS kernel on top of which you sit still cares about things like interrupt entry latency, which means that stack usage analysis and inlining management has a home, etc... The bluetooth radio and network stacks you're using likely has performance paths that force people to look at disassembly to understand.

But it's true that outside the top-level "don't make dumb design decisions" decision points, application code in the embedded world is reasonably insulated form this kind of nonsense. But that's because the folks you're standing on did the work for you.


i just learned the other day that you can get a computer for 1.58¢ in quantity 20000: https://jlcpcb.com/partdetail/NyquestTech-NY8A051H/C5143390

if we can believe the datasheet, it's basically a pic12f clone (with 55 'powerful' instructions, most single-cycle) with 512 instructions of memory, a 4-level hardware stack, and 32 bytes of ram, with an internal 20 megahertz clock, 20 milliamps per pin at 5 volts, burning half a microamp in halt mode and 700 microamps at full speed at 3 volts

and it costs less than most discrete transistors. in fact, although that page is the sop-8 version, you can get it in a sot23-6 package too

there are definitely a lot of things you can do with this chip if you're willing to optimize your code. but you aren't going to start with a 30-kilobyte firmware image and optimize it until it fits

yeah it's not an nrf52840 and you probably can't do ble on it. but the ny8a051h costs 1.58¢, and an nrf52840 costs 245¢, 154 times as much, and only runs three times as fast on the kinds of things you'd mostly use the ny8a051h for. it does have a lot more than 154 times as much ram tho

for 11.83¢ you can get a ch32v003 https://www.lcsc.com/product-detail/Microcontroller-Units-MC... which is a 48 megahertz risc-v processor with 2 kilobytes of ram, 16 kilobytes of flash, a 10-bit 1.7 megahertz adc, and an on-chip op-amp. so for 5% of the cost of the nrf52840 you get 50% of the cpu speed, 1.6% of the ram, and 0% of the bluetooth

for 70¢, less than a third the price of the nrf52840, you can get an ice40ul-640 https://www.lcsc.com/product-detail/Programmable-Logic-Devic... which i'm pretty sure can do bluetooth. though it might be saner to hook it up to one of the microcontrollers mentioned above (or maybe something with a few more pins), you can probably fit olof kindgren's serv implementation of risc-v https://github.com/olofk/serv into about a third of it and probably get over a mips out of it. but the total amount of block ram is 7 kilobytes. the compensating virtue is that you have another 400 or so luts and flip-flops to do certain kinds of data processing a lot faster and more predictably than a cpu can. 19 billion bit operations per second and pin-to-pin latency of 9 nanoseconds

so my summary is that there's a lot of that kind of embedded work going on, maybe more than ever, and you can do things today that were impossible only a few years ago


Just to be pedantic, if it's a clone of the 8 bit PICs, then one instruction takes 4 clock cycles, so a 20MHz clock should be considered 5MHz if you're trying to compare operations per second.


that's a good point! i wondered about that, but i don't have the chip yet, so i checked the datasheet. the datasheet lists a cycle count for each instruction, and as i said, most instructions are 1 cycle

on the other hand, something like a 32-bit multiplication or a floating-point subtraction is going to cost a lot of instructions, if you can afford it at all


That was my way of thinking as I was junior programming.


And now...?


After being burn waaay too many times with one of: 1) write only code (for the sake of “speed” 2) optimization of the wrong piece of code

I do think it is much better to prioritize readability; then measure where the code has to be sped up, and then do changes, but try HARD to first find a better algorithm, and if that does not work, and more processor, or equipment is not viable or still does not work, go for less readable code, which is microoptimized


They're a manager and send out daily emails reminding the coders of arbitrary deadlines.


Who they?!


I feel like I probably agree with about 80% of this. It also seems like this would apply fairly well to C++ as well.

One thing that I'll strongly quibble with: "Use double rather than float, unless you have a specific reason otherwise".

As a graphics programmer, I've found that single precision will do just fine in the vast majority of cases. I've also found that it's often better to try to make my code work well in the single precision while keeping an eye out for precision loss. Then I can either rewrite my math to try to avoid the precision loss, or selectively use double precision just in the parts where its needed. I think that using double precision from the start is a big hammer that's often unneeded. And using single precision buys you double the number of floats moving through your cache and memory bandwidth compared to using double precision.


I'm torn both ways on the double issue. On the one hand, doubles are much more widely supported these days, and will save you from some common scenarios. Timestamps are a particular one, where a float will often degrade on a time scale that you care about, and doubles not. A double will also hold any int value without loss (on mainstream platforms), and has enough precision to allow staying in world coordinates for 3D geometry without introducing depth buffer problems.

OTOH, double precision is often just a panacea. If you don't know the precision requirements of your algorithm, how do you know that double precision will work either? Some types of errors will compound without anti-drifting protection in ways that are exponential, where the extra mantissa bits from a double will only get you a constant factor of additional time.

There are also current platforms where double will land you in very significant performance problems, not just a minor hit. GPUs are a particularly fun one -- there are currently popular GPUs where double precision math runs at 1/32 the rate of single precision.


I think you used the word “panacea” incorrectly. Judging by context, I would guess that the word “band-aid” would better convey your intended meaning.


A "panacea" is something that cures.every illness. 64-bit floats could do just that, in the cases listed. The cost of it may be higher than one cares to pay though.

And when the cure fails to be adequate, well, it becomes a band-aid, a temporary measure in search of a real solution.


IIUC, they're using the word itself correctly, but they mean "double precision is often used with the intent of it being a panacea".


Perhaps "placebo" was intended?


Are there C++ libs that use floating points for timestamps? I was under the impression that most stacks have accepted int64 epoch microseconds as the most reasonable format.


I think Apple is still doing that: https://developer.apple.com/documentation/foundation/nstimei...

Couple decades ago Microsoft did that too, VT_DATE in old OLE Automation keeping FP64 value inside. Luckily, their newer APIs and frameworks are using uint64 with 100-nanoseconds ticks.


It's very common in games.

Integers are always an option, of course, but in this context it's hard to beat the convenience of just storing seconds in a floating point number.

Related: https://randomascii.wordpress.com/2012/02/13/dont-store-that...


Don't have a publicly visible reference to give at the moment, but it's still sometimes seen where relative timestamps are being tracked, such as in an audio library tracking time elapsed since start. It's probably less used for absolute time where the precision problems are more obvious.


> int64 epoch microseconds

surely nanoseconds is the truth.


LONG_MAX nanoseconds is just Friday, April 11, 2262 11:47:16.854 PM, not exactly a future-proof approach. I guess having Tuesday, September 21, 1677 12:12:43.145 AM as the earliest expressible timestamp neatly sidesteps the problem of proleptic Gregorian vs Julian calendars.


The one about not using 'switch' and instead using combined logical comparisons is terrible ... quite opinionated, but that is usually the case with these type of style guides.


As the author 10 years later, I agree. A hard-and-fast rule to ban switch, as that rule seems to advocate, is silly and terrible.

Switch has many valid uses. However, I also often see switch used in places where functional decomposition would've been much better (maintainable / testable / extensible). So I think there's still value in advocating for those switch alternatives, such as that rule's text covers. Not that I agree with everything there either. But, useful for discussion!


it's like they purposely add some controversial rule just for engagement


He even uses 'switch' on his code.


I think the fact that graphics care a lot more about efficiency over marginal accuracy qualifies for a specific reason. Besides from that and a few select areas like ML, almost any reason to use `float` by default vanishes.


There are popular embedded platforms like STM32 that don't have hardware double support, but do have hardware float support. Using double will cause software double support to be linked and slow down your firmware significantly.


OK, but if you're writing for that kind of platform, you know it. Don't use double there? Sure. "Don't use double on non-embedded code just because such platforms exist" doesn't make sense to me.

Sure, my code could maybe run on an embedded platform someday. But the person importing it probably has an editor that can do a search and replace...


> I feel like I probably agree with about 80% of this.

What I was thinking too. There's something in here to offend everyone, and that's probably a good thing.


I think your case comes under the "specific reason to use `float`"? If I am writing some code and I need floating point numbers, then without any more context, I will choose `double`. If I have context and the context makes it so `float`s are vastly better, then I will use `float`s.


The issue for me is that unlabeled constants are doubles and they can cause promotion where you don't expect it, leading to double arithmetic and rounding instead of single arithmetic. Minor issue, but hidden behavior.


What's even more annoying is that the *printf functions take in double which forces you to cast all of the floats you pass in when using -Wdouble-promotion


funnily the example used, i.e `printf` of single values is very special. Under the hood, variadic arguments that are `single` are actually converted to `double`. See the `cvtss2sd` in [1].

[1]: https://godbolt.org/z/Yr7Kn4vqr


Yeah, sometimes as a graphics programmer you don't even want the precision provided by built-in functions! As it has been pointed out though, be careful about error propagation


Treat 79 characters as a hard limit

Try pasting a long URL into a comment describing a method/problem/solution and you’ll see immediately that it doesn’t fit 77 chars and you cannot wrap it. Then due to your hard limit you’ll invent something like “// see explained.txt:123 for explanation” or maybe “https://shrt.url/f0ob4r” it.

There’s nothing wrong with breaking limits if you do that reasonably, cause most limits have edge cases. It’s (Rule -> Goal X) most of the times, but sometimes it’s (Rule -> Issue). Make it (Solution (breaks Rule) -> Goal X), not (Solution (obeys Rule) -> not (Goal X)).


Agree. This 80 character limit stems from a time where terminals could only display comparatively few characters in a line, a limit we haven't had in decades as screen resolutions grew.

Another argument for shorters lines is that it is much harder for us to read any text when lines get too long. There's a reason why we read and write documents in portrait mode, not landscape.

But in sum, I don't think there's a need for creating a hard limit at the 80 character mark. Most code is not indented more than three or four times anyways, and most if not all languages allow you to insert newlines to make long expressions wrap. However, if you occasionally do need to go longer, I think that's completely fine and certainly better than having to bend around an arcane character limit.


> This 80 character limit stems from a time where terminals could only display comparatively few characters in a line, a limit we haven't had in decades as screen resolutions grew.

The 80 char rule has little to do with old monitors. Has to do with ergonomics, and is why any good edited and typeset book will have between 60 and 80 characters per line.


It is a fair point, but a book is 60 - 80 characters of dense prose line after line in thick paragraphs; it is not clear how this translates to lines of code.


In my personal experience (and of couse very subjective) it helps me a lit to have lines that fit the monitor, and I can have 2 parallel windows. Using 120 chars is for me just too much. I do think the golden rule of typography is “all rules can be broken if you know what you are doing”. For me is a soft limit. If splitting a line makes the code less readable, I do allow more. But frankly, that is the case of one in maybe 50k LOC


As a part of ergonomics auditory, I really prefer 100-115 column soft limit for code editors, log viewing and console, because that’s how my single display dev setup works best. Otoh if I’m using IDEs with big sidebars like {X,VS}Code, then I need two displays and/or a full-width IDE anyway.

While I understand that this is an anecdotal preference, to me it doesn’t feel like the 80 column standard fits any modern dev workspace perfectly, tbh. (By modern I don’t mean “shiny”, just what we have now in hw/sw.)


Exactly this. Open a novel and count the characters on a line; around 80 is readable as 500 years of typographic practice has determined. Two or three levels of indentation and that bumps the page width up a bit, still less than 100.


Monitors are too new. Punch cards have 80 columns. I think this even pre-dates the use of teletypes with electronic computers.


Then let it be 80 characters from any whitespace on the left-hand side? I find it artificially awkward to have to wrap at a hard margin on the right-hand side. Surely you need to accommodate any indenting you're doing?


Take for example man pages, I find the very comfy to read. And they have generous margins on both sides. About indenting, I try to avoid deep nesting, usually 3 is a maximum, very rare to need more, if the code has to be easy to read.


80 column punched cards were a very strong influence


It has to do with both and that's why my comment mentions both.

But then again, of course there is a reason why terminals (or punchcards) were made that way - presumably because of reading / writing ergonomics (besides technical reasons).


This rules readme isn't even less than 80 characters per line. You're saying we all had trouble reading it?


And at the very least, "80-characters-per-line is a de-facto standard for viewing code" has been long wrong. As the post even mentions, 100 and 120 columns have been another popular choices and thus we don't really have any de-facto standard about them!


My opinion is that line width depends on identifier naming style.

For example Java often prefers long explicitly verbose names for class, fields, methods, variables.

Another approach is to use short names as much as possible. `mkdir` instead of `create_directory`, `i` instead of `person_index` and so on.

I think that max line length greatly depends on the chosen identifier naming style. So it makes sense to use 100 or 120 for Java and it makes sense to use 72 for Golang.

C code often use short naming style, so 72 or 80 should be fine.


C is the worst at naming length since everything is in the global namespace unless you're working on a teeny tiny project. So everything gets clunky prefixes to use as pseudo-namespaces or overly descriptive name to avoid conflicts.

Local variables sure, be short and terse. But that's common in most languages.


And you risk a collision for global identifiers, which cannot be reorganized in C. If some library had the same thought and defined `mkdir` first, your code can't define `mkdir` even as a private symbol. So you have to prefix everything and that contributes to the identifier length. In comparison, Go has a much better (but personally not yet satisfactory) module system and can have a lower limit.


Also on age, e.g. I began to prefer grandma font size from time to time for better focus, and that changes my workspace geometry.

For mkdir all-in, meh. It’s okay for mkdirp or rimraf, cause these basically became new words. But once you add more libs or code to a project it becomes a cryptic mess of six-letter nonsense. English code reads best, Java just overdoes it by adding patterns into the mix.


I doubt that rule applies to pasting URLs in comments. It's about code.


No rule is absolute. So you may have some line longer. Anyway I’m VERY skeptic that hardcoding URLs is a good idea at all.


> Anyway I’m VERY skeptic that hardcoding URLs is a good idea at all.

They are talking about URLs in comments.


Oh. For comments I think would be ok, if it is a comment by itself, and not needed to particularly understand a piece of code, like in a headers. Still links are much more volatile than the codebases I work with. But I understand that may not hold in many many others setups.


Only hardcore cool URLs!


I agree with most, and most of the others I might quibble with, but accept.

However, the item to not use unsigned types is vastly stupid! Signed types have far more instances of UB, and in the face of 00UB [1], that is untenable.

It is correct that mixing signed and unsigned is really bad; don't do this.

Instead, use unsigned types for everything, including signed math. Yes, you can simulate two's complement with unsigned types, and you can do it without UB.

On my part, all of my stuff uses unsigned, and when I get a signed type from the outside, the first thing I do is convert it safely, so I don't mix the two.

This does mean you have to be careful in some ways. For example, when casting a "signed" type to a larger "signed" type, you need to explicitly check the sign bit and fill the extension with that bit.

And yes, you need to use functions for math, which can be ugly. But you can make them static inline in a header so that they will be inlined.

The result is that my code isn't subject to 00UB nearly as much.

[1]: https://gavinhoward.com/2023/08/the-scourge-of-00ub/


Author here, 10 years later -- I agree. I'd remove that rule wholesale in an update of this guide. Unsigned integer types can and should be used, especially for memory sizes.

I would still advocate for large signed types over unsigned types for most domain-level measurements. Even if you think you "can't" have a negative balance or distance field, use a signed integer type so that underflows are more correct.

Although validating bounds would be strictly better, in many large contexts you can't tie validation to the representation, such as across most isolation boundaries (IPC, network, ...). For example, you see signed integer types much more often in service APIs and IDLs, and I think that's usually the right call.


I think with those changes, my disagreement would become a mere quibble.

> I would still advocate for large signed types over unsigned types for most domain-level measurements. Even if you think you "can't" have a negative balance or distance field, use a signed integer type so that underflows are more correct.

I agree with this, but I think I would personally still use unsigned types simulating two's complement that gives the correct underflow semantics. Yeah, I'm a hard egg.


In the vast majority of cases, integer overflow or truncation when casting is a bug, regardless whether it is undefined, implementation-defined or well-defined behavior. Avoiding undefined behavior doesn't buy you anything.

If you start to fuzz test with UBSan and -fsanitize=integer, you will realize that the choice of integer types doesn't matter much. Unsigned types have the benefit that overflowing the left end of the allowed range (zero) has a much better chance of being detected.


> Avoiding undefined behavior doesn't buy you anything.

This is absolutely false.

Say you want to check if a mathematical operation will overflow. How do you do it with signed types?

Answer: you can't. The compiler will delete any form of check you make because it's UB.

(There might be really clever forms that avoid UB, but I haven't found them.)

The problem with UB isn't UB, it's the compiler. If the compilers didn't take advantage of UB, then you would be right, but they do, so you're wrong.

However, what if you did that same check with unsigned types? The compiler has to allow it.

Even more importantly, you can implement crashes on overflow if you wish, to find those bugs, and I have done so. You can also implement it so the operation returns a bit saying whether it overflowed or not.

You can't do that with signed types.

> If you start to fuzz test with UBSan and -fsanitize=integer, you will realize that the choice of integer types doesn't matter much.

I do this, and this is exactly why I think it matters. Every time they report UB is a chance for the compiler to maliciously destroy your hard work.


> You can't do that with signed types.

What? Of course you can. If you want to add ints a and b:

    if (b >= 0 ? a > INT_MAX - b : a < INT_MIN - b)
        printf("overflow\n");


Okay, that's addition.

Now do it for multiplication.


> In the vast majority of cases, integer overflow or truncation when casting is a bug, regardless whether it is undefined, implementation-defined or well-defined behavior. Avoiding undefined behavior doesn't buy you anything.

With respect, this is nonsense. With UB, the compiler might remove the line of code entirely. With overflow/underflow/truncation, the results are well-defined and the compiler is not allowed to simply remove the offending line.


>Prefer compound literals to superfluous variables

I used to agree with this but I have moved away from compound literals entirely except for global statics/const definitions.

Having a variable and explicit:

  foo.x = whatever;
  foo.y = something_else;
Leads to better debug experience imo, can set breakpoints and single step each assignment and have a name to put a watch on.


Hmm, what's the point of single-stepping over a simple data assignment though? And when the initialization involves function calls, the debugger will step into those anyway.

One advantage of initialization via compound literals is that you can make the target immutable, and you won't accidentially get any uninitialized junk in unlisted struct members, e.g.:

const vec3 vec = { .x = 1.0, .y = 2.0 };

...vec.z will be default-initialized to zero, and vec doesn't need to be mutable.


> 80-characters-per-line is a de-facto standard for viewing code. Readers of your code who rely on that standard, and have their terminal or editor sized to 80 characters wide, can fit more on the screen by placing windows side-by-side.

This is one of the silliest practices to still be enforced or even considered in 2024. “Readers” should get a modern IDE/text editor and/or modern hardware.


On my 4K monitor, I use 4-5 vertical splits and 2-3 horizontal splits. The 80 column rule makes each of these splits readable, and allows me to see the full context of a chunk of kernel code or firmware at once. It has nothing to do with "modern" hardware or "modern" IDEs. It has everything to do with fitting the most amount of relevant information that I can on the screen at once, in context, and properly formatted for reading.

The 80 column rule may seem arbitrary, but it really helps analysis. I avoid open source code that ignores it, and I'll ding code that violates it during code review.

If I had code marching off the screen, or rudely wrapped around so it violated spacing, I'd have to reduce the number of splits I used to see it, and that directly impacts my ability to see code in context. Modern IDEs don't reduce the need to see things in context. It's not a matter of organizing things in drop-down menus, smart tabs, font changes, or magic "refactor" commands. Verifying function contracts in most extant software -- which lacks modern tooling like model checking -- requires verifying these things by hand until these contracts can be codified by static assertions. This, in turn, requires examining function calls often 5-6 calls deep to ensure that the de facto specifications being built up don't miss assumptions made in code deep in the bowels of under-documented libraries. I'd be terribly upset if I had to try to do this in code that not only missed modern tooling but that was written by a developer who mistakenly believed that "80 columns is for geezers." I freely admit that, at 43, I probably count as a "geezer" to many young developers. But, that doesn't change the utility of this rule. Violations of contracts in software account for a large percentage of errors in software AND security vulnerabilities. Most of these violations are subtle and easy to miss unless you can see the call stack in context. No developer can keep hundreds of details from code that they did not write in their head with perfect clarity. It's incredibly nice to have uniform style and uniform maximum line lengths. By convention, 80 columns has shown itself to be the most stable of these limits.

Even FAANG companies like Google follow this rule.


> Even FAANG companies like Google follow this rule.

Google also uses 100


Most of their monorepo code I came across used 80 columns. But, I can't speak for all of it. Google has a LOT of code.

Either way, if Google and other companies can do what they do in 80 columns, I think it's a fair constraint. What we get out of this constraint is the ability to put a lot of context on the screen.


I'm using modern IDE and 32" 4K display yet I still support this rule. One example where it's particularly convenient is 3-way merge. Also if we're talking about IDE's, they often use horizontal space for things like files tree (project explorer) and other tool windows.


And on a wide display it's very convenient to use the width to put useful ancillary content on there (e.g. docs, company chat, ...). I shouldn't waste half my display on nothing because you won't add line breaks to your code.

Annoyingly lots of modern website have very wonky breakpoints / detection and will serve nonsense mobile UIs on what I think is reasonable window widths e.g. if you consider bootstrap's "xl" to be desktop then an UWQHD display (3440x1440) won't get a desktop layout in 3 (to say nothing of 4) columns layouts, nor may smaller laptops (especially if they're zoomed somewhat).


The part you quoted has the one argument against yours right at the end. It's not about hardware or IDEs or text editors, it's about workspace layout.


au contraire! considering programming involves a lot of reading, it overlaps (or even comes from) with. best practices from ye olde tradition of typesetting https://en.m.wikipedia.org/wiki/Line_length#:~:text=Traditio.... Aside books and print magazines and newspapers, we still respect that on web sites when reading is involved, why should programming be exempt of ergonomy?


programming involves a lot of reading

Is that true for an average developer, really? Yes, we read lots of manuals, snippets, stackoverflows. But code? One does mostly write code.

And when we do read code, it may lack good naming, structure, comments, clarity, may be unnecessarily complex or hacky. Where does it wrap is the thing one would care about only in perfect code, if at all. Most editors can smart-wrap and clearly indicate it anyway.


> Is that true for an average developer, really? Yes, we read lots of manuals, snippets, stackoverflows. But code? One does mostly write code.

No, every developer almost certainly reads a lot more code than they write. You can't modify code to add a feature without reading and understanding the code first. The code you add is often very short compared to the code you need to read to understand what to modify.


As soon as you collaborate with more people, 80 charactars becomes a valid target for line width. Eventually you'll have someone reading your code in a manner that is hardly pleasant with lengths of 200 characters or more:

  - Someone using a Braille display
  - Someone with a vision impairment (i.e. high scaling factor; common occurence during ageing)
  - A group of people that doesn't sit close to the display
  - Someone with a low-DPI (or small) display due to the normal workplace being unavailable
While you could, of course, disregard all these scenarios, the sheer amount of people profiting from or requiring a character limit on lines is usually grounds for a restrictive policy regarding this topic. You might consider it silly, but as long as there is no reliable way to convert between these "styles of presentation" you will find that many people prefer to err on the safe side.


There is a reason why books have only between 45 to 75 characters per line. It greatly enhances readability.


IMHO if the 80-column limit bothers you in C, you're writing bad C. Quoting the kernel docs, it is "warning you when you’re nesting your functions too deep. Heed that warning".

I remember reading this for the first time as a teenager: "if you need more than 3 levels of indentation, you’re screwed anyway, and should fix your program". Twenty years later, it seems like solid advice to me.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


The rule is a bit silly sure, but OTH I typically have multiple editors tabs open side by side (I don't restrict myself to a hard 80 char line width though, but I have vertical rulers set at 80 and 120 characters in the editor as visual guidance).


A proportional font really helps ergonomics too.


This is probably a snarky reply, but here is the serious answer: proportional fonts, with appropriate kerning, is a lot more legible than monospaced font. There is a reason why the press moved into that direction once it was technically feasible. But the same people that bring books as an example why 80 character line length should be enforced would gag at the notion of using proportional fonts for development. It just goes to show that none of these things actually matter, it’s just legacy patterns that remain in-place from sheer inertia, with really very little relevancy today other than the inertia of the past.


<snark> I'm glad you cleared this all up for us. </snark>

Other people disagree with you and it's best to not assume they are idiots.


So far, you are the only one making a fool of themselves.


Code uses much more punctuation than prose, and punctuation is hard to discern in a proportional font.


I agree. I've tried coding in C-like languages with proportional fonts a few times, and punctuation ends up feeling cramped, hurting legibility. We need more proportional fonts for programming where punctuation gets the same size and spacing as in monospaced fonts.


Depends on which language you are writing in. Historically Smalltalk UIs use proportional fonts, and they work just fine.


"We can't get tabs right, so use spaces everywhere"

I'm more like: Always use tabs, never use space. Code doesn't need to be "aligned" it's not some ASCIIart masterpiece...

One tab means one indentation level and if your taste is to have tabs of pi chars wide, nice! But it won't mess my code


Declare all variables/qualifiers right-to-left.

Read the type for all the below right-to-left, substituting the word "pointer" for "*".

  int long long unsigned wibble; // unsigned long long int
  double const *long_number; // pointer to a const double
  double volatile * const immutable_pointer; // immutable pointer to a volatile double
They all read correctly now, when read right-to-left. It's not just "const" you do this for, as per the advice. Do it for all qualifiers.


* doesn't (have to) mean "pointer"!

It works in simple cases, but I find the consistent thing to do is read it as "dereference".

    double volatile *(const immutable_pointer); 
    // immutable_pointer is immutable, when you dereference it you'll get a volatile double


Yes, that's the philosophy around the declaration syntax.

The declaration of the pointer ip,

  int *ip;
is intended as a mnemonic; it says that the expression *ip is an int. The syntax of the declaration for a variable mimics the syntax of expressions in which the variable might appear. This reasoning applies to function declarations as well.

K&R C


What's the author's justification? What's your justification?

> They all read correctly now, when read right-to-left.

... suppose I'm someone who reads from left-to-right, should I flip the order to make it correct for me?


> What's the author's justification? What's your justification?

I’m neither of them, but chances are that’s because you can’t make them left-to-right all the time.

  double const *foo; // foo is a pointer to a const double
  double *const foo; // foo is a const pointer to a double
compile and do what the comment says; these do not compile:

  * const double foo; // a pointer to a const double named “foo”
  foo * const double; // foo is a pointer to a const double


I use the "right-to-left" style myself. To me, the qualifier (in this case, const), applies to the item to the right. This could be confusing:

    const char *const ptr;
The first const applies to the char, but the second one to the pointer itself. Being consistent:

    char const *const ptr;
The first const applies to the item to its left---char. The second const applies to the item to its left---the pointer. To recap:

    char *ptr1; // modifiable pointer to modifiable data
    char const *ptr2; // modifiable pointer to const data
    char *const ptr3; // const pointer to modifiable data
    char const *const ptr4; // const pointer to const data


Readability.

C declarations can become unfriendly by being too complex and disordered.


Nothing seems wrong with “volatile double pointer as a constant” or “constant character pointer” either, tbh. The way you presented is equivalent, but non-idiomatic, people would stumble upon it often. To become more readable universally this must have been adopted 50 years ago.


i was sort of hoping for something like https://nullprogram.com/blog/2023/10/08/, which shows a bunch of inventions that simplify your programming. some of them may be more trouble than they're worth, but they're at least novel and interesting

by contrast, this document is largely motherhood and apple pie — and where it isn't (e.g., when it advocates titlecasing struct types or never typedeffing primitive types), i often think it's wrong. 'Write assertions to meaningfully crash your program before it does something stupid, ... to prevent a security vulnerability' is especially wrong; one of the major features of the standard assert() macro is that it's turned off in release builds!

the named-arguments macro hack is an example of the kind of thing i was most hoping to find in here

if you write something like this, don't dilute whatever value it may have with your opinions about tabs vs. spaces, line length, what natural language to write your comments and identifiers in, include guards, how many blank lines to put between functions, etc. these have been debated to death, and you're unlikely to have any brilliant insights about them that other people will be happy to have read


> Never have more than 79 characters per line Never write lines longer than 79 characters.

I'm sorry, I just cannot do this. I start to feel somewhat guity after 300 characters but 80 feels like an Atari 800.


Tabs vs Spaces

Tabs are always correct, IF spaces are never used instead. One tab, for one level of indent. Adjust to preference.

Alas, I don't think there's a standard way of specifying...

// kate: space-indent off; indent-width 8; tab-width 8; mixedindent off; indent-mode tab;

Similarly, // comments should be preferred, but /* comments */ are acceptable at the top of large function blocks for large blobs of comments. Judicious / sparing use as the key idea to make it worth the exceptions if commenting out large blocks during tests or refactors.


There is always .editorconfig [1] to setup indent if you have a directory of files. In places where it really matters (Python) I'll always comment with what I've used.

[1] https://editorconfig.org/


Very interesting.

Just one (personal) stuff : Stick to 80 columns... Sorry, no ! :)


> Always write to a standard, as in -std=c11. Don't write to a dialect, like gnu11. Try to make do without non-standard language extensions: you'll thank yourself later.

Why not? Do people really care about porting their toy project to another compiler? If portability is a goal, avoid extensions, but not all projects need to be portable.

I'm writing an Operating System, and I do not care it if compiles on Clang or MSVC. GCC has been around for decades, it is a safe bet.


Even when I'm writing a toy operating system, or other toy projects, I personally always try to use the standard options like -std=c11 (though don't have anything against people who use the dialect option).

I'm happy to use compiler extensions, but I'll use __asm__ instead of asm, and use __extension__ as needed. I've done some neat but truly upsetting things with compiler extensions in my hobby code, especially once I combine them with macros. I'm particularly "proud" of the mutex macros in a toy OS of mine, which wrap a statement inside for loops and switch statements for automatic release of the mutex, unless it's requested that it stay locked. There, I originally used compiler extensions to release the mutex on scope exit, but switched to non-compiler-extension code for the actual functions, and just using the extensions for checking that the code using the macros didn't break the "contracts" on what is allowed in those statements, and how they can be exited.

It's the same reason that I'm always explicit about the size of integers, using stdint, even if I know that an int is 32-bit on a particular platform.


Personally I do, the Windows builds for my game use a Windows VM with MSVC, the Linux builds use GCC for certain optimizations, and for the dev builds I use Clang for sanitizers. Sure, you mention "a toy project" but he's talking about trying to avoid non-standard extensions in general, which is a fair point.


The recommendation of using the most recent standard is very slightly inconsistent with this, because in rare cases your code could be reused in weird embedded targets that only have an unmaintained proprietary C compiler. That was already rare in 2014 I think. Still, the situation might arise, just like you can still can find AS/400 machines or Cobol in production.


Funny, this is to a great extent opposite to how I write C.


Ah, seeing people posting coding styles, and other people debating them, I know my life is too short to join this conversation.


> developers have a hope of being able to determine which #includes can be removed and which can't

Can’t a modern compiler do that already? Didn’t google but seems an obvious compiler feature at the very least behind a warning flag.


Clang-tidy (linter) can do that. IMO it's a good idea to integrate this tool to any C project. I'm using gcc for embedded projects and clang-tidy works just fine as a separate tool.

https://clang.llvm.org/extra/clang-tidy/checks/misc/include-...


I'm using https://github.com/include-what-you-use/include-what-you-use in preference over clang-tidy.


How many of these are applicable to other languages?


Probably not many. Most of the rules are workarounds for C’s design flaws.


While I don't agree every single point (see below), one thing great about this document is that the author tried to really elaborate one's opinion. That makes a good point to start the discussion regardless of my own opinion. Thus I'll contribute back by giving my own judgement for every single item here:

Absolute agreement

  * Always develop and compile with all warnings (and more) on
  * #include the definition of everything you use
  * Provide include guards for all headers to prevent double inclusion
  * Always comment `#endif`s of large conditional sections
  * Declare variables as late as possible
  * Be consistent in your variable names across functions
  * Minimize the scope of variables
  * Use `assert` everywhere your program would fail otherwise
  * Repeat `assert` calls; don't `&&` them together
  * C isn't object-oriented, and you shouldn't pretend it is
Strong agreement with some obvious exceptions

  * Use `//` comments everywhere, never `/* ... */`
  * Comment non-standard-library `#include`s to say what symbols you use from them
  * No global or static variables if you can help it (you probably can)
  * Minimize what you expose; declare top-level names static where you can
  * Use `double` rather than `float`, unless you have a specific reason otherwise
  * Avoid non-pure or non-trivial function calls in expressions
  * Simple constant expressions can be easier to read than variables
  * Initialize strings as arrays, and use sizeof for byte size
  * Where possible, use `sizeof` on the variable; not the type
  * Document your struct invariants, and provide invariant checkers
  * Avoid `void *` because it harms type safety
  * If you have a `void *`, assign it to a typed variable as soon as possible
  * Only use pointers in structs for nullity, dynamic arrays or incomplete types
  * Avoid getters and setters
Agreed but you need a few more words

  * Don't be afraid of short variable names [if the scope fits on a screen]
  * Explicitly compare values; don't rely on truthiness
    [unless values themselves are boolean]
  * Use parentheses for expressions where the operator precedence isn't obvious
    [but `&foo->bar` *is* obvious]
  * Separate functions and struct definitions with two lines
    [can use comments instead]
  * If a macro is specific to a function, `#define` it in the body [and `#undef` ASAP]
  * Only typedef structs; never basic types or pointers
    [or make them distinct enough, but ISO C stole a `_t` suffix]
I do so or I see why but that's really a problem of C and its ecosystem instead

  * Use GCC's and Clang's `-M` to automatically generate object file dependencies
  * Avoid unified headers
  * Immutability saves lives: use `const` everywhere you can
  * Use `bool` from `stdbool.h` whenever you have a boolean value
  * Avoid unsigned types because the integer conversion rules are complicated
  * Prefer compound literals to superfluous variables
  * Never use array syntax for function arguments definitions
  * Don't use variable-length arrays
  * Use C11's anonymous structs and unions rather mutually-exclusive fields
  * Give structs TitleCase names, and typedef them
  * Never begin names with `_` or end them with `_t`: they're reserved for standards
  * Only use pointer arguments for nullity, arrays or modifications
  * Prefer to return a value rather than modifying pointers
  * Always use designated initializers in struct literals
I do so but am not sure

  * Write to the most modern standard you can [we have no choice for many cases]
  * Program in American English [only applicable for native speakers]
I see why but I think you are mislead

  * Don't write argument names in function prototypes if they just repeat the type
    [such case is very, very rare]
  * Use `+= 1` and `-= 1` over `++` and `--`
    [`++`/`--` should be read as succ/pred and should be exclusively used for pointers]
  * Don't use `switch`, and avoid complicated conditionals
    [switch is okay once you have enabled enough warnings]
  * Only upper-case a macro if will act differently than a function call
    [agreed in principle, but should define "differently" more broadly]
  * Always prefer array indexing over pointer arithmetic
    [and then you will be biten by index variable types, remember `ptrdiff_t`]
That's really just a personal preference

  * We can't get tabs right, so use spaces everywhere
    [as long as mechanically enforcable, the choice itself is irrelevant]
  * Always put `const` on the right and read types right-to-left [too eyesore]
  * Use one line per variable definition; don't bunch same types together
    [will agree with some significant exceptions though]
  * Never change state within an expression (e.g. with assignments or `++`)
    [absolutely avoid functions, but `++` has its uses]
  * Always use brackets, even for single-statement block
    [rather a read-write trade-off; this may make some codes harder to read]
  * Never use or provide macros that wrap control structures like `for`
    [the example is very tame in comparison to actually problematic macros]
  * Don't typecast unless you have to (you probably don't)
    [while many typecasts can be easily removed, excess doesn't do actual harm]
  * Give enums `UPPERCASE_SNAKE` names, and lowercase their values
    [I would rather avoid enums for various reasons]
  * Use structs to name functions' optional arguments
    [maybe the author tried to say "avoid too many arguments" instead?]
  * If you're providing allocation and free functions only for a struct member,
    allocate memory for the whole struct
    [that complicates using struct as a value]
Just no.

  * Never have more than 79 characters per line
    [100 or 120 do work equally well, you do need some limit though]
  * Define a constant for the size of every enum
    [would imply that all enum values are sequential, and that's not true!]


> * No global or static variables if you can help it (you probably can)

I can't imagine doing this on embedded systems.


I think it's better understood as, "global and static state should only be declared by the client". So all of your functions should take pointers and all of your data should preferably be declared as types, and the points of your program that consume your code can declare their lifetime.

So if you're inclined to declare 2 global variables, instead define a struct with those two values and your functions take a pointer to that struct, and whatever code calls into yours can then decide whether they want to define that struct as having global scope. It just makes for more modular code.


That's what I meant by "some obvious exceptions" :-)


> * Program in American English [only applicable for native speakers]

I don't really care that much about American vs. British English, except that it should be consistent within the code base. But I do think programming in English is generally a best practice that applies even if you aren't a native speaker.

I agree with most of your (dis)agreements, though.


Yeah, that was what I meant (mea culpa). I often appreciate the use of native languages when all members are expected to understand that, but a public code base would have to be written in English. The exact dialect of English is not as important.


> * We can't get tabs right, so use spaces everywhere > [as long as mechanically enforcable, the choice itself is irrelevant]

It's not mechanically enforceable (in practice), that's the point. Forbidding tabs altogether is the most practical and actionable path.


I'm not sure what you have in mind, but in my mind the mechanical enforcement really means something like clang-format [1] and that surely works.

[1] https://clang.llvm.org/docs/ClangFormatStyleOptions.html#use...


Have you actually tried? Personally I have evaluated and then abandonded clang-format (for reasons including but not limited to the interplay between macros and indentation), and most tool's support for tab-indentation + space-alignments is flaky to non-existent. I wouldn't want to constrain my setup to one that integrates clang-format just for a needlessly complicated requirement when I could just abandon tabs altogether.


If you want to keep space alignments that's honestly to be expected. Exclusively using tabs would work much better, partly for the reason you have said. On the requirement of clang-format itself though... Yeah, that is more like a problem of C and its ecosystem indeed.


Nice Summary of the Article ! Thank You.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: