The Hardest Program I've Ever Written

greggman · on Sept 10, 2015

Great blog post and super interesting.

My feeling though is the problem is they have a line limit. Maybe they should rethink their style. I'm serious.

Before I worked at Google, in 30 years of programming I never worked at a company that had a line limit. Adding a line limit at Google did not make me more productive. At first I thought "hey, I guess 80 chars makes side by side comparison easier" but then I thought back, hmm. I never had problems comparing code before when I didn't have a line limit.

Instead what I found was that 80 character limit was a giant waste of time. The article just pointed out a year of wasted time. I can point to searching and replacing an identifier and then having to go manually reformat hundreds of lines of code all because of some arbitrary style guide. I also had code generators at google that had to generate code that followed the line limit. I too wasted days futsing with the generator to break the lines at the correct places all because of some arbitrary line limit.

That should be the real takeaway here. Make sure each rule of your style guide actually serves a purpose or that its supposed benefits outway its costs.

munificent · on Sept 10, 2015

> My feeling though is the problem is they have a line limit.

There are a few reasons why we do 80 columns:

1. Human eyes have an effective line limit. The longer a line gets, the harder it is to scan back to the beginning of the next line. This is why paperbacks are taller than they are wide and why newspapers use several short columns instead of one wide one.

2. Being too narrow hurts readability, sure, but being too wide does as well. Also, even though many developers have giant monitors now, we also spend a lot of time on laptops, doing side-by-side code reviews, looking at code on blogs, etc. 80 columns is pretty friendly towards all of the various and sundry places where a user may be looking at some code.

3. We found it encourages better code. Dart is syntactically kind of a superset of Java, which means you can write Dart code that looks like Java. In particular, you can revisit some of the egregiously verbose naming practices that infected that community in the 90s. I see a lot of code like:

    LoggedInUserPreferenceManager preferences = new LoggedInUserPreferenceManager();

A shorter column limit has worked as an effective nudge to get people to do:

    var preferences = new Preferences();

> The article just pointed out a year of wasted time.

We'll get the time back. It's amortized over the amount of time saved by running it x the number of engineers using it.

> I can point to searching and replacing an identifier and then having to go manually reformat hundreds of lines of code all because of some arbitrary style guide.

The problem here is that you had to manually reformat it! Refactoring is a key goal of automated formatting. You can make a sweeping change to the length of an identifier and automatically fix the formatting every where it appears.

> I also had code generators at google that had to generate code that followed the line limit.

Code generators are another explicit use case. We have a lot of code generators now that produce completely unformatted code and the run dartfmt on it.

> I too wasted days futsing with the generator to break the lines at the correct places all because of some arbitrary line limit.

Should have used an automated formatter.

skybrian · on Sept 10, 2015

Did you know that Go doesn't have a line limit and gofmt doesn't wrap lines? It only fixes whitespace within a line. It's up to programmers to split lines manually. It seems to work for them, so clearly there's more than one way to do this.

munificent · on Sept 10, 2015

> Did you know that Go doesn't have a line limit and gofmt doesn't wrap lines?

Yup.

My feeling is that this keeps gofmt much simpler, but it kind of punts the problem onto users. I wanted a more complete solution, even though the result is a lot more complex.

greggman · on Sept 10, 2015

The idea of an auto-formatter has come up many times on the Chrome team. It is always shot down because formatting often imparts meaning. Meaning that is unclear to an auto-formatter. Of course you're free to run an auto-formatter over your own code.

Often I want to format things that are more readable for me. Example (yea, not Google style guide example. Too lazy to dig one up)

    uint32_t rgba8888 = ((red   & 0xFF) << 24) | 
                        ((green & 0xFF) << 16) |
                        ((blue  & 0xFF) <<  8) |
                        ((alpha & 0xFF) <<  0);

vs

    uint32_t rgba8888 = ((red & 0xFF) << 24) | 
                        ((green & 0xFF) << 16) |
                        ((blue & 0xFF) << 8) |
                        (alpha & 0xFF);

I think the first is objectively more readable than the second. An auto-formatter is unlikely to ever format that in a "readable" way.

Another simple example. If I have a many argument function

    ctx.arc(xPosition, yPosition, radius, startAngle, endAngle, clockwise);

If I break that line I'm not going to break it between xPositon and yPosition, nor and I'm going to break it between startAngle and endAngle. An auto-formatter will never know that semantically those things are more readable when they are on the same line.

Similarly you claim the short length encourages shorting names but you still run into plenty of situations where the code is far far less readable because of the line limit.

Example, assume 40 char limit

    int w = desired_width *
        scale_factor + padding;
    int h = desired_height *
        scale_factor + padding;

    glBindTexture(
        GL_TEXTURE_2D, someTexture);
    glTexImage2D(
        GL_TEXTURE_2D, level, format, 
        w, h, border, format, type, data);
    glTexParameter(
        GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, 
        GL_REPEAT);

vs

    int w = desired_width  * scale_factor + padding;
    int h = desired_height * scale_factor + padding;

    glBindTexture(GL_TEXTURE_2D, someTexture);
    glTexImage2D(GL_TEXTURE_2D, level, format, w, h, border, format, type, data);
    glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);

Chrome in particular is full of code line wrapped into what is effectively obfuscated code. So no, I don't agree what a line limit has any point.

You claim human eyes have a line limit. I don't disagree per say, but in my years before Google I never found anyone seriously abusing line length. Then again I never had to use Java but that's a separate issue. I hope Dart is not targeting Java's verboseness. I could make the same claim that unbroken lines, up to a point, are more readable, understandable and I will go on to claim that the 80 char limit at Google breaks that rule and ends up cause 20% of Google's code or more to effectively be obfusticated.

mzs · on Sept 10, 2015

My eyes (much) prefer the 40-char version to that last code at the end, though I might do it like so:

    int w, h;
    
    w = desired_width;
    w *= scale_factor;
    w += padding;
    
    h = desired_height;
    h *= scale_factor;
    h += padding;

    glBindTexture(
        GL_TEXTURE_2D, someTexture
    );
    glTexImage2D(
        GL_TEXTURE_2D, level, format, 
        w, h, border, format, type, data
    );
    glTexParameter(
        GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, 
        GL_REPEAT
    );

That's assuming C, for C++ I might start it off like so:

    int w = desired_width;
    w *= scale_factor;
    w += padding;
    
    int h = desired_height;
    h *= scale_factor;
    h += padding;

I might be tempted to use a macro but likely pass and just use shorter names all around.

michaelt · on Sept 10, 2015

Nowerdays on the web we've seen the benefits of separating content and presentation. Apart from inertia and weak existing tools, I don't see why we programmers mix presentation (indenting, newlines and line breaks) with our content.

If it was up to me, programs on disk and in version control would basically be an abstract syntax tree, and any time a user viewed them they would be formatted using whatever style rules and screen size the user liked.

The way we do things at the moment seems like the equivalent of those marketing e-mails where all the text is made into a big image to "make it look right".

skybrian · on Sept 10, 2015

That doesn't solve anything because you still have to write a parser and formatter. Once you have one it's easier to run the formatter when writing to disk and the parser when loading. A sufficiently smart editor could reformat the code in an alternate style for editing and format back again to the standard style on save.

philh · on Sept 10, 2015

It looks like this formatter isn't just for internal use, at a company that can buy each of their developers multiple 25-inch monitors. Not everyone has screen space to burn making their editors and terminals arbitrarily wide. If the majority of Dart code was formatted with a line length limit above 80, that would significantly deter me from exploring Dart.

greggman · on Sept 10, 2015

That would seem to rule out using around 99% of all open source projects then? I know of very few that have an 80 character limit.

munificent · on Sept 10, 2015

Dart's official style guide has always suggested 80 columns, so open source projects that follow that will be happy to have dartfmt do it for them.

If they want a different limit, it's configurable.

philh · on Sept 10, 2015

I don't contribute to many open-source projects. But I'd imagine Python would have a good track record, PEP8 specifies something like a 72-character limit.

That said, long lines in an existing project aren't a dealbreaker, if I have other reasons for contributing. (The one I'm working on now has them.) But if there's no reasonable way for me to write code in a language without breaking 80 chars, I'm not likely to choose that language for my own projects.

marssaxman · on Sept 11, 2015

The specific number of columns may be arbitrary, but you have to specify some number or your codebase will turn into a mess.

I've seen it happen - you'll get that one guy who likes to zoom a single editor to the full size of his monitor and then just type, type, type all the way to the far edge, making it impossible for anyone else to figure out what the hell he's doing without switching to his idiosyncratic editor setup. He'll see nothing wrong with nesting control structures fifteen or twenty levels deep, because it will all look completely reasonable on his screen, and he'll happily glom up absurdly complicated 30-40 character long identifiers because he has no taste and autocompletes all his identifiers anyway. An organization like Google can't tolerate that kind of crap and a strict code formatting guideline provides a simple first line of defense toward keeping it under control.

Limiting line length makes it easier for developers to work on each other's code, because they won't have to go resize all their terminals or change the length marker on a per-file basis. It's also a mechanical way to push developers toward shorter parameter lists and shorter, less deeply nested functions, which are good practice anyway.

(For what it's worth, I too have been programming for around 30 years, and Google is also the only place I've worked that had an official style guide with line limits. My reaction is the opposite of yours: I loved it, and I've tried to lobby for the practice everywhere I've worked since. I don't care what the details of the style guide are so long as they are enforced consistently - it's a really great feeling to drop into some far-distant file written by people you'll never meet and still find that the code is clear and consistent with the code you work on every day. I like 80 columns because it lets me fit three full size editors on each monitor, but I would be just as happy with 96 or 110 or 132 or whatever as long as there is some consistent limit I can count on.)

ryandvm · on Sept 10, 2015

I can get behind most code style guidelines, but people who harp on 80 character line limits drive me nuts. It's such a funny anachronism. Is there really someone out there using an editor that can't soft-wrap?

The width of a developer's editor window is so fundamentally a presentation issue - I have trouble imagining anything more so. Having line length limits is like mandating editor color schemes. How about I set my soft-wrap preferences the way I like and you can do the same?

dragonwriter · on Sept 10, 2015

> I can get behind most code style guidelines, but people who harp on 80 character line limits drive me nuts. It's such a funny anachronism. Is there really someone out there using an editor that can't soft-wrap?

Personally, I'd never know. I assume my editors can soft-wrap, but, IME, in terms of being able to easily work on code, lines that fit the window are better than long lines that aren't soft-wrapped, and long lines that aren't soft-wrapped are better than long-lines that are soft-wrapped, so I don't ever use soft-wrapping features.

I'm personally not that tied to 80 characters as a perfect line limit, but its a not unreasonable general guideline for most code in most languages. Like most guidelines, there's times when its inconvenient as a hard limit.

marssaxman · on Sept 11, 2015

I never use soft wrap. Code layout communicates meaning, and ought to be created on purpose. Having line length limits is like expecting developers to write comments. The compiler doesn't care, but other human beings do.

chrismcb · on Sept 12, 2015

80 chars is too narrower. But it isn't about the editor, or word wrapping. It is about the plethora of other tools. Duffs, commit histories, compiler errors and so on

chrismcb · on Sept 12, 2015

I'm not a huge fan if strict line limits. But in general you are better off keeping your lines shorter. There are many reasons, shorter lines are easier to read. Some programming lines are naturally short. Switching between long and short lines can be difficult. But the biggest reason, many tools work better with shorter lines. Diffs, commit histories and so on. I don't know how many times the delta was in code that was off the screen and the tool didn't have a slider (or maybe it did, but sliding back and forth was painful) Generally shorter lines are better than longer ones.

tantalor · on Sept 10, 2015

I also had code generators at google that had to generate code that followed the line limit.

That's an unusual requirement for a code generator. What is the purpose?

Generated code generally is not read by humans or checked into source control. I suppose in the rare event where you want to read the code you could rely on your editor's line wrapping feature.

rch · on Sept 10, 2015

Just to choose an open source example: JET generated code from the Eclipse Modeling Framework is generally expected to be checked in, read, and edited by humans.

sytelus · on Sept 10, 2015

I have tried few code formatters and almost always regretted it. The big problem is that code formatting contains a significant portion of intent and explanation. Sometimes I want to put two assignments on same line because it emphasizes relationship and atomicity but other times it's better to keep them on separate lines for sparcity. There are actually quite a few times I wanted a line go well beyond 80 chars because I wanted to de-emphasize unimportant monotonous part taking away all attention and have far more more important steps immediately stand out to reader. I take code formatting very seriously and consider an integral part of my expression. Style guides are good but they shouldn't be followed like a robot, let alone enforced by robot. In fact code formatting tells a lot about culture and philosophy of an author. For example K&R C starts braces on same line to emphasis compactness as elegance, C# doesn't to emphasize sparse code as elegance. In SQL sometime it's great to put subquery on same line and sometime it doesn't - it really depends on what you want to emphasize and convey rather than hard and fast rules on number of tokens and syntax analysis. Code formatting is not just set of fixed rules, it's a communication mechanism that guides reader on what to focus, what is unimportant, where is a gasoline spill and where wild fires may burn. This is not to say everyone takes their formatting seriously which is where automate formatter would probably add value (and the case where you are importing/copy pasting from somewhere else). I think K&R C is likely the gold standard for code formatting. You should try out your formatter on those snippets ;).

rwallace · on Sept 10, 2015

It's funny, I hold exactly the opposite view for exactly the same reason! I regard a code formatter nowadays as an essential tool for programming productivity, the fourth most important tool after an editor, compiler and web browser. The reason is that the limit on how much I can get done is not so much wall clock time as mental energy. The thing that costs mental energy is making design decisions. Without a code formatter, every few lines provides another invitation to make a design decision about layout.

CardenB · on Sept 11, 2015

What formatting tools do you use?

rwallace · on Sept 11, 2015

These days, clang-format

weavie · on Sept 10, 2015

That's all fine if you are the only person who will be touching that file. The problem is that most of the time a piece of code will have multiple developers working on it over time. Each dev will have a different perspective of what that intent and explanation should be.

I have wasted far too much of my life arguing with people about how code should be formatted. Ideally I would just have a rule set up in source control to format the code on checkin and be done with it.. Then if I want a different style when I come to edit a file I can run whatever formatter I want on it and it won't affect anyone else.

dmytrish · on Sept 10, 2015

I wish your comment was formatted with some paragraphs :(

ridiculous_fish · on Sept 10, 2015

Thanks for writing this, it was a great read!

I maintain a source beautifier too [1], and it's not as nice as I would like. One of the issues I run into is that the correct indent on a broken line is context dependent. For example:

    while (someReallyReallyReallyReallyLongFunction() &&
           anotherLongFunction()) {
       loopBody();
    }

is a nicer indenting than:

    while (someReallyReallyReallyReallyLongFunction() &&
        anotherLongFunction()) {
        loopBody();
    }

In the first case, the two conditions are aligned which makes the code clearer. Does dartfmt handle this? If not, do you have ideas on how it might?

Also, how does it handle invalid input? I may want to reindent my code before it's correct.

Also, did you explore constraint solvers instead of a graph traversal? It seems like they would be a natural fit.

[1]: fish_indent, https://github.com/fish-shell/fish-shell/blob/master/src/fis...

vog · on Sept 10, 2015

In that specific case, isn't it better style to write it differently anyway?

Either:

    conditionWithReadableName: function() {
        return
            someReallyReallyReallyReallyLongFunction()
            && anotherLongFunction();
    }
    ...
    while (conditionWithReadableName()) {
        loopBody();
    }

Or if you don't want to make up a good name for your conditional:

    while (true) {
        if (!someReallyReallyReallyReallyLongFunction()) break;
        if (!anotherLongFunction()) break;
        loopBody();
    }

Or, in a more rules-based fashion:

    conditionWithReadableName: function() {
        if (!someReallyReallyReallyReallyLongFunction()) return false;
        if (!anotherLongFunction()) return false;
        return true;
    }
    ...
    while (conditionWithReadableName()) {
        loopBody();
    }

I find it troubling to make up "good" indentation rules when the code to indent isn't well-written in the first place. Multi-line conditionals are an anti-pattern in itself (no matter if they appear in "while", "if" or "for").

philh · on Sept 10, 2015

I'm not sure I like any of those as much as the original. When you introduce a new name, I don't know if you're going to use it anywhere else. (But having a name is good, so I'm on the fence.)

When all your while conditions are inside the block, I have to actually look at it to figure out that it's really just a standard while loop.

ikurei · on Sept 10, 2015

I agree with you in that multi-line conditionals should go, and no beautifier is going to make it up for bad code, but it's still great that they try. You can't always devote as much time as you'd like to make the code perfect, and that is when a beautifier comes the most handy. Also, having a good beautifier when you have to deal with other not-so-perfectionist programmer's code is great.

vog · on Sept 10, 2015

> having a good beautifier when you have to deal with other [...] programmer's code is great.

Good point! I didn't think of that.

munificent · on Sept 10, 2015

> Does dartfmt handle this? If not, do you have ideas on how it might?

Yes, it does! Correct indentation based for nested expressions is a vital feature for helping the reader understand the structure of the code.

The basic idea is fairly simple. Any place a line break may appear, you mark it with a number representing how deeply nested in the expression it is. So in code like:

    function(outer(inner(first, second), third));

You would get chunks whose expression nesting level is like:

    function(   1
    outer(      2
    inner(      3
    first,      3
    second),    2
    third));

When you line break, you ensure that its nesting level gets assigned a correct indentation level. Deeper nesting means deeper indentation:

    function(
        outer(
            inner(
                first,
                second),
            third));

There are a bunch of interesting edge cases, though. Some times a nesting level doesn't happen on a line break, so it doesn't need to get an indentation level associated with it:

    function(outer(inner(
        first,
        second), third));

Here, levels 1 and 2 don't appear in line breaks, so we give the first indentation level to nesting level 3. There are stranger cases where you may assign a deeper level before a shallower one like:

    function(outer(inner(
            first,
            second),
        third));

Shaking out all of the bugs requires a lot of work and a really big test suite.

> Also, how does it handle invalid input? I may want to reindent my code before it's correct.

If the code doesn't parse, it just exits with an error. Being able to run it on incomplete input would be useful, but I decided it was out of scope.

> Also, did you explore constraint solvers instead of a graph traversal?

Good question! I have a little experience with them. I think the line between the two is sort of fuzzy. I didn't approach it directly like a constraint solving problem, but it does have some features in common.

ridiculous_fish · on Sept 10, 2015

I'm not sure you understood my example. Let's say we have:

    while (first && second) {
        third
    }

We might assign nesting levels like this:

    while(    1
    first &&  2
    second)   2
    third     2

which could lead to this line breaking:

    while (first &&
        second) {
        third
    }

This is bad because 'second' and 'third' are visually aligned, and so one might think that 'second' is in the loop body instead of the condition. We want to indent 'second' more than 'third', even though it has the same nesting level:

   while (first &&
          second) {
       third
   }

This is what I meant by "context dependent:" here we have two chunks at the same indentation level but that want different numbers of spaces. Does dartfmt attempt to handle this?

munificent · on Sept 10, 2015

Ah, the style rules address this case. Bodies are indented +2 and wrapped expressions are +4. That means you will always get:

    while (first &&
        second) {
      third // <--
    }

So the body is indented less than the wrapped condition. Expression nesting is considered different from block nesting.

sklogic · on Sept 10, 2015

For this sort of things I'm using an intermediate language (guess it's similar to what OP is talking about) with stack semantics: there are instructions like "PUSHINDENT", 'POPINDENT" and "INCINDENT" - with PUSHINDENT inserted after '(' in ifs, function calls, etc. - ensuring that if line breaks before the POP instruction, indentation would reflect the context.

So, the biggest problem is in assigning weights to the break candidate instructions is this stream. This can only be done by a lot of experimenting, I could not find any formal method or a passable heuristic.

detrino · on Sept 10, 2015

Have you used clang-format at all? It's the first formatting tool that I have come across that I have no problem letting format 100% of my code.

qznc · on Sept 10, 2015

> do you have ideas on how it might?

On each opening parens put the current line length (=indentation) on a stack. On closing parens pop from the stack. On newline use the top of the stack as indentation.

sudo_bang_bang · on Sept 10, 2015

"you’d expect it to do something pretty deep right?... Nope. It reads in a string and writes out a string."

We're all just doing the same thing in one way or another :) Good work and nice article.

aiiane · on Sept 10, 2015

Along these lines, we have a joke at Google that all of our problems are just transforming one protocol buffer into another.

feelix · on Sept 10, 2015

And programming anything, or taking over the universe for that matter, is just a matter of the hitting the right keys on the keyboard in the right order.

ant6n · on Sept 10, 2015

https://xkcd.com/722/

jonahx · on Sept 10, 2015

And all human activity since the dawn of time is just using our bodies to move molecules from one place in space to another....

QuercusMax · on Sept 10, 2015

It's so true... I've only been at Google since the end of June and already have written a few proto converters. And teaching the super-newbies (i.e. fresh out of school) the difference between the frontend and backend protobufs...

ridiculous_fish · on Sept 10, 2015

For what it's worth, I never encountered protobufs during my time at Google.

sudo_bang_bang · on Sept 10, 2015

"Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data." [1]

Interesting, I figured Google was handling data serialization in their own way, and now I know[2]

[1]https://developers.google.com/protocol-buffers/docs/overview [2]https://github.com/google/protobuf

Dylan16807 · on Sept 10, 2015

Right, cute, but you're cutting out the context that it's outputting the same thing as the input.

gsnedders · on Sept 10, 2015

This is my mostly terrible argument when people start claiming compilers are hard. :)

jcizzle · on Sept 10, 2015

Huh, I thought the 'Reformat to Dart Style' option in IntelliJ was using dartfmt and I was so disappointed at its output I stopped using it. Just went and tried dartfmt from the command line after reading this - dartfmt is significantly better. Fun to hear the approach that went into it.

For anyone that hasn't tried it, grab the Dart SDK and the IntellIJ Dart plugin. Takes less than 5 minutes to setup. It's been a great platform for building server side stuff - I haven't tried it for front end web stuff. It took about 3 reads of the language tour (https://www.dartlang.org/docs/dart-up-and-running/ch02.html) and about a week and I already felt very comfortable with the entire platform.

munificent · on Sept 10, 2015

> Just went and tried dartfmt from the command line after reading this - dartfmt is significantly better.

\o/ It improved a lot in the past two months. That's when rules and the new splitter landed.

kitd · on Sept 10, 2015

Good article!

The most complex single piece of code I ever wrote was a scheduler. The user could specify a pattern of when events should be raised (eg on this date, at this time, every other hour on the last day of every month, at midnight for me in this TZ on a server in another TZ, etc), and the scheduler would raise the events at the prescribed instant(s).

That took about 9 months, and my biggest takeaway was that how humans measure time is completely f*ed up!

justinator · on Sept 10, 2015

This made me curious as to what the Perl Tidy formatter was like, as I use it often. You know, it's Perl, so maybe a few regexes here and there, and much wizardry.

The Tidy.pm module is 1.1M in size, and over 30,000 lines long. I have much respect for formatters now, I thought the job they do was an easy one.

Fantastic looking sourcecode, btw,

https://metacpan.org/source/SHANCOCK/Perl-Tidy-20150815/lib/...

bigger_cheese · on Sept 10, 2015

In the intro to programming course I took at University the final assignment we had was something similar (Format Text) we had to write a program that when given three arguments (a text file to read from, an int specifying the characters per line and an int specifying the lines per page) would output the text file formatted correctly breaking on words etc.

From memory there were other requirements indenting the first word of each paragraph things like that.

As the article alludes to it was a surprisingly complex problem - we also had to worry about memory allocation as we were using C. I remember I was quite proud when I got the sample text (which was a few paragraphs from "The Hobbit") to render correctly.

I've never thought about writing a code formatter I just trust emacs to format my code for me. I'd be interested in digging up my old code and seeing how easily I could modify it to operate on source code.

chriswarbo · on Sept 10, 2015

I think code formatters are a great idea, but they're not quite clever enough for me yet.

For example, if there's a common pattern among a set of lines, I'll often line them up vertically to make the repetition clear and focus attention on the differences rather than the commonalities; for example:

    if (foo  ||
        quux ||
        baz) {
      ....
    }

    let foo  = 10
        quux = foo  * 2
        baz  = quux + 1
     in baz * 2

    fields = ['name', 'address', 'country',
               'dob',  'status',  'salary']

To me, those few extra spaces make it easier to glance over the code than without:

    if (foo ||
        quux ||
        baz) {
      ....
    }

    let foo = 10
        quux = foo * 2
        baz = quux + 1
     in baz * 2

    fields = ['name', 'address', 'country',
              'dob', 'status', 'salary']

gchpaco · on Sept 10, 2015

Pretty printing is surprisingly tricky. Last time I had to do it, I ended up with this: https://github.com/rethinkdb/rethinkdb/blob/next/src/pprint/... which has become my favorite algorithm for it as it's quite tweakable for specific needs. Fun little algorithm.

AceJohnny2 · on Sept 10, 2015

"Amazingly, surprisingly, counterintuitively, the indentation problem is almost totally orthogonal to parsing and syntax validation. I'd never have guessed it. But for indentation you care about totally different things that don't matter at all to parsers. Say you have a JavaScript argument list: it's just (blah, blah, blah): a paren-delimited, comma-separated, possibly empty list of identifiers. Parsing that is pretty easy. But for indentation purposes, that list is rife with possibility!" -- Steve Yegge, 2008 [1]

That really struck me back then, and I've kept it in mind whenever I hear about code beautifying/indenting.

[1] http://steve-yegge.blogspot.com/2008/03/js2-mode-new-javascr...

sklogic · on Sept 10, 2015

And yet, the best place to add your pretty-printing and indentation hints is parser. Hints are attached to the grammar, so it makes sense to merge the two things, and then generate two different tools out of the single source. Three, actually - an AST pretty-printer, a textual code formatter and, finally, a parser itself.

eltaco · on Sept 10, 2015

There's an issue for a CST (AST with whitespace, comments, etc) in the estree repo [1]. JSCS is planning on using https://github.com/mdevils/cst for future autofixing rules.

[1] https://github.com/estree/estree/issues/41

sklogic · on Sept 10, 2015

The beauty of this pretty-printing solution (merging it with the parser) is that you don't even need any parsing tree to be constructed. The parser will simply walk the stream and annotate it with the pretty-printing instructions (pushing an popping the indentation context, adding the weighted break candidates, etc.).

al2o3cr · on Sept 10, 2015

" Even if the output of the formatter isn’t great, it ends those interminable soul-crushing arguments on code reviews about formatting."

Similarly, covering all your food in Doritos dust doesn't always taste great but it ends the interminable soul-crushing arguments about what flavor things should have.

rfrey · on Sept 10, 2015

Flavor (as well as texture and aroma) is the whole point of cooking (otherwise, Soylent). Formatting is not the point of programming.

tremon · on Sept 10, 2015

I'd say nourishment is the whole point of cooking. The advent of cooking made more foods digestable to our intestines and other foods less risky.

Flavour, texture and aroma are learned appreciations, and are not universal. So I think the comparison with code style is quite apt.

jdbernard · on Sept 10, 2015

No, nourishment is just the bare essentials of cooking. In fact, we have a very large problem, literally and figuratively, because we often don't care about the nourishment. Consider junk foods and others that we call "empty calories" because they have no real nutritional value.

My point being that I don't really see the persuasive value of the analogy. It's a false-equivalence. The point was that yes many people often care deeply about the formatting of the code (myself included), but discussions around formatting are almost always a form of bike-shedding. A better analogy would be "randomly selecting the restaurant doesn't always lead you to your favorite place, but at least it prevents the interminable discussions about where to go."

munificent · on Sept 10, 2015

I think a better analogy is that having a free cafe ends those interminable discussions about where to go to lunch each day.

davvolun · on Sept 10, 2015

So you're arguing we should continue arguing about code formatting, esp. during a code review?

I disagree with the comparison, but go ahead with it, what's your point?

draw_down · on Sept 10, 2015

Well that's not a very good point.

pjtr · on Sept 10, 2015

I've never worked with a fmt tool, but run C# StyleCop[1] on each build to warn about style violations. Naively to me that seems to give the same benefit, but is probably significantly easier to write, is easily configurable and extensible, and leaves me in control.

Isn't it annoying when a globally optimizing tool switches back and forth between "all arguments on one line" and "all arguments on separate lines"? E.g. producing overly complex whitespace changes in diffs for small "triggering" changes?

[1] https://stylecop.codeplex.com/

Kenji · on Sept 10, 2015

Every surviving line has about three fallen comrades.

My first thought was CSS.

lolptdr · on Sept 10, 2015

Has anyone done any comparisons to other code formatters of other languages? Or even other code formatters within Dart?

Wish I could gain more context on how big an arena of these types of programs. I'm a bit lost as to how important code formatters and beautifiers were until reading more on the difficulty of writing such a program by Mr. Nystrom.

eltaco · on Sept 10, 2015

For javascript, there's been jsbeautifier [1], jsfmt [2], uglify.

JSCS [3] added autofixing a while back for most whitespace rules, and ESLint has just begun autofixing as well [4]

[1] http://jsbeautifier.org/ [2] https://github.com/rdio/jsfmt [3] http://jscs.info/ [4] https://github.com/eslint/eslint/pull/3635

qznc · on Sept 10, 2015

This whole "Pruning redundant branches" stuff essentially reduces to "do A* search".

It is fascinating that we don't have a definitive method for formatting, yet.

http://beza1e1.tuxen.de/articles/formatting_code.html

munificent · on Sept 10, 2015

A* implies that you have a heuristic function pointing towards a known destination. It's about knowing where you want to go, but not how to get there.

In this case, we don't know where in the solution space the best solution will be. It's not even that easy to tell if we've found it. So that rules out simple pathfinding algorithms like A*.

eliben · on Sept 10, 2015

Yup, all of this is fairly tricky for YAPF as well (https://github.com/google/yapf). We ended up reusing clang-format's algorithm

amelius · on Sept 10, 2015

Now try to write a formatter that runs incrementally (i.e., keeps formatting while the user types).

_ZeD_ · on Sept 10, 2015

I suggest you guys to play a little with eclipse and its configurable source formatter.

jdbernard · on Sept 10, 2015

I work in a team that uses the eclipse formatter to enforce our style-guide, so I get the value. Having said that, configurable defeats the point of a community-wide formatting tool. From the article:

Even if the output of the formatter isn’t great, it ends those interminable soul-crushing arguments on code reviews about formatting.

With a configurable formatter you just move those arguments into the style-guide discussions and still have disagreements between people and teams using different configured values.

adultSwim · on Sept 10, 2015

Good article; poor title.