My feeling though is the problem is they have a line limit. Maybe they should rethink their style. I'm serious.
Before I worked at Google, in 30 years of programming I never worked at a company that had a line limit. Adding a line limit at Google did not make me more productive. At first I thought "hey, I guess 80 chars makes side by side comparison easier" but then I thought back, hmm. I never had problems comparing code before when I didn't have a line limit.
Instead what I found was that 80 character limit was a giant waste of time. The article just pointed out a year of wasted time. I can point to searching and replacing an identifier and then having to go manually reformat hundreds of lines of code all because of some arbitrary style guide. I also had code generators at google that had to generate code that followed the line limit. I too wasted days futsing with the generator to break the lines at the correct places all because of some arbitrary line limit.
That should be the real takeaway here. Make sure each rule of your style guide actually serves a purpose or that its supposed benefits outway its costs.
> My feeling though is the problem is they have a line limit.
There are a few reasons why we do 80 columns:
1. Human eyes have an effective line limit. The longer a line gets, the harder it is to scan back to the beginning of the next line. This is why paperbacks are taller than they are wide and why newspapers use several short columns instead of one wide one.
2. Being too narrow hurts readability, sure, but being too wide does as well. Also, even though many developers have giant monitors now, we also spend a lot of time on laptops, doing side-by-side code reviews, looking at code on blogs, etc. 80 columns is pretty friendly towards all of the various and sundry places where a user may be looking at some code.
3. We found it encourages better code. Dart is syntactically kind of a superset of Java, which means you can write Dart code that looks like Java. In particular, you can revisit some of the egregiously verbose naming practices that infected that community in the 90s. I see a lot of code like:
LoggedInUserPreferenceManager preferences = new LoggedInUserPreferenceManager();
A shorter column limit has worked as an effective nudge to get people to do:
var preferences = new Preferences();
> The article just pointed out a year of wasted time.
We'll get the time back. It's amortized over the amount of time saved by running it x the number of engineers using it.
> I can point to searching and replacing an identifier and then having to go manually reformat hundreds of lines of code all because of some arbitrary style guide.
The problem here is that you had to manually reformat it! Refactoring is a key goal of automated formatting. You can make a sweeping change to the length of an identifier and automatically fix the formatting every where it appears.
> I also had code generators at google that had to generate code that followed the line limit.
Code generators are another explicit use case. We have a lot of code generators now that produce completely unformatted code and the run dartfmt on it.
> I too wasted days futsing with the generator to break the lines at the correct places all because of some arbitrary line limit.
Did you know that Go doesn't have a line limit and gofmt doesn't wrap lines? It only fixes whitespace within a line. It's up to programmers to split lines manually. It seems to work for them, so clearly there's more than one way to do this.
> Did you know that Go doesn't have a line limit and gofmt doesn't wrap lines?
Yup.
My feeling is that this keeps gofmt much simpler, but it kind of punts the problem onto users. I wanted a more complete solution, even though the result is a lot more complex.
The idea of an auto-formatter has come up many times on the Chrome team. It is always shot down because formatting often imparts meaning. Meaning that is unclear to an auto-formatter. Of course you're free to run an auto-formatter over your own code.
Often I want to format things that are more readable for me. Example (yea, not Google style guide example. Too lazy to dig one up)
If I break that line I'm not going to break it between xPositon and yPosition, nor and I'm going to break it between startAngle and endAngle. An auto-formatter will never know that semantically those things are more readable when they are on the same line.
Similarly you claim the short length encourages shorting names but you still run into plenty of situations where the code is far far less readable because of the line limit.
Example, assume 40 char limit
int w = desired_width *
scale_factor + padding;
int h = desired_height *
scale_factor + padding;
glBindTexture(
GL_TEXTURE_2D, someTexture);
glTexImage2D(
GL_TEXTURE_2D, level, format,
w, h, border, format, type, data);
glTexParameter(
GL_TEXTURE_2D, GL_TEXTURE_WRAP_S,
GL_REPEAT);
vs
int w = desired_width * scale_factor + padding;
int h = desired_height * scale_factor + padding;
glBindTexture(GL_TEXTURE_2D, someTexture);
glTexImage2D(GL_TEXTURE_2D, level, format, w, h, border, format, type, data);
glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
Chrome in particular is full of code line wrapped into what is effectively obfuscated code. So no, I don't agree what a line limit has any point.
You claim human eyes have a line limit. I don't disagree per say, but in my years before Google I never found anyone seriously abusing line length. Then again I never had to use Java but that's a separate issue. I hope Dart is not targeting Java's verboseness. I could make the same claim that unbroken lines, up to a point, are more readable, understandable and I will go on to claim that the 80 char limit at Google breaks that rule and ends up cause 20% of Google's code or more to effectively be obfusticated.
My eyes (much) prefer the 40-char version to that last code at the end, though I might do it like so:
int w, h;
w = desired_width;
w *= scale_factor;
w += padding;
h = desired_height;
h *= scale_factor;
h += padding;
glBindTexture(
GL_TEXTURE_2D, someTexture
);
glTexImage2D(
GL_TEXTURE_2D, level, format,
w, h, border, format, type, data
);
glTexParameter(
GL_TEXTURE_2D, GL_TEXTURE_WRAP_S,
GL_REPEAT
);
That's assuming C, for C++ I might start it off like so:
int w = desired_width;
w *= scale_factor;
w += padding;
int h = desired_height;
h *= scale_factor;
h += padding;
I might be tempted to use a macro but likely pass and just use shorter names all around.
Nowerdays on the web we've seen the benefits of separating content and presentation. Apart from inertia and weak existing tools, I don't see why we programmers mix presentation (indenting, newlines and line breaks) with our content.
If it was up to me, programs on disk and in version control would basically be an abstract syntax tree, and any time a user viewed them they would be formatted using whatever style rules and screen size the user liked.
The way we do things at the moment seems like the equivalent of those marketing e-mails where all the text is made into a big image to "make it look right".
That doesn't solve anything because you still have to write a parser and formatter. Once you have one it's easier to run the formatter when writing to disk and the parser when loading. A sufficiently smart editor could reformat the code in an alternate style for editing and format back again to the standard style on save.
It looks like this formatter isn't just for internal use, at a company that can buy each of their developers multiple 25-inch monitors. Not everyone has screen space to burn making their editors and terminals arbitrarily wide. If the majority of Dart code was formatted with a line length limit above 80, that would significantly deter me from exploring Dart.
I don't contribute to many open-source projects. But I'd imagine Python would have a good track record, PEP8 specifies something like a 72-character limit.
That said, long lines in an existing project aren't a dealbreaker, if I have other reasons for contributing. (The one I'm working on now has them.) But if there's no reasonable way for me to write code in a language without breaking 80 chars, I'm not likely to choose that language for my own projects.
The specific number of columns may be arbitrary, but you have to specify some number or your codebase will turn into a mess.
I've seen it happen - you'll get that one guy who likes to zoom a single editor to the full size of his monitor and then just type, type, type all the way to the far edge, making it impossible for anyone else to figure out what the hell he's doing without switching to his idiosyncratic editor setup. He'll see nothing wrong with nesting control structures fifteen or twenty levels deep, because it will all look completely reasonable on his screen, and he'll happily glom up absurdly complicated 30-40 character long identifiers because he has no taste and autocompletes all his identifiers anyway. An organization like Google can't tolerate that kind of crap and a strict code formatting guideline provides a simple first line of defense toward keeping it under control.
Limiting line length makes it easier for developers to work on each other's code, because they won't have to go resize all their terminals or change the length marker on a per-file basis. It's also a mechanical way to push developers toward shorter parameter lists and shorter, less deeply nested functions, which are good practice anyway.
(For what it's worth, I too have been programming for around 30 years, and Google is also the only place I've worked that had an official style guide with line limits. My reaction is the opposite of yours: I loved it, and I've tried to lobby for the practice everywhere I've worked since. I don't care what the details of the style guide are so long as they are enforced consistently - it's a really great feeling to drop into some far-distant file written by people you'll never meet and still find that the code is clear and consistent with the code you work on every day. I like 80 columns because it lets me fit three full size editors on each monitor, but I would be just as happy with 96 or 110 or 132 or whatever as long as there is some consistent limit I can count on.)
I can get behind most code style guidelines, but people who harp on 80 character line limits drive me nuts. It's such a funny anachronism. Is there really someone out there using an editor that can't soft-wrap?
The width of a developer's editor window is so fundamentally a presentation issue - I have trouble imagining anything more so. Having line length limits is like mandating editor color schemes. How about I set my soft-wrap preferences the way I like and you can do the same?
> I can get behind most code style guidelines, but people who harp on 80 character line limits drive me nuts. It's such a funny anachronism. Is there really someone out there using an editor that can't soft-wrap?
Personally, I'd never know. I assume my editors can soft-wrap, but, IME, in terms of being able to easily work on code, lines that fit the window are better than long lines that aren't soft-wrapped, and long lines that aren't soft-wrapped are better than long-lines that are soft-wrapped, so I don't ever use soft-wrapping features.
I'm personally not that tied to 80 characters as a perfect line limit, but its a not unreasonable general guideline for most code in most languages. Like most guidelines, there's times when its inconvenient as a hard limit.
I never use soft wrap. Code layout communicates meaning, and ought to be created on purpose. Having line length limits is like expecting developers to write comments. The compiler doesn't care, but other human beings do.
80 chars is too narrower. But it isn't about the editor, or word wrapping. It is about the plethora of other tools. Duffs, commit histories, compiler errors and so on
I'm not a huge fan if strict line limits. But in general you are better off keeping your lines shorter. There are many reasons, shorter lines are easier to read. Some programming lines are naturally short. Switching between long and short lines can be difficult.
But the biggest reason, many tools work better with shorter lines. Diffs, commit histories and so on. I don't know how many times the delta was in code that was off the screen and the tool didn't have a slider (or maybe it did, but sliding back and forth was painful)
Generally shorter lines are better than longer ones.
I also had code generators at google that had to generate code that followed the line limit.
That's an unusual requirement for a code generator. What is the purpose?
Generated code generally is not read by humans or checked into source control. I suppose in the rare event where you want to read the code you could rely on your editor's line wrapping feature.
Just to choose an open source example: JET generated code from the Eclipse Modeling Framework is generally expected to be checked in, read, and edited by humans.
I have tried few code formatters and almost always regretted it. The big problem is that code formatting contains a significant portion of intent and explanation. Sometimes I want to put two assignments on same line because it emphasizes relationship and atomicity but other times it's better to keep them on separate lines for sparcity. There are actually quite a few times I wanted a line go well beyond 80 chars because I wanted to de-emphasize unimportant monotonous part taking away all attention and have far more more important steps immediately stand out to reader. I take code formatting very seriously and consider an integral part of my expression. Style guides are good but they shouldn't be followed like a robot, let alone enforced by robot. In fact code formatting tells a lot about culture and philosophy of an author. For example K&R C starts braces on same line to emphasis compactness as elegance, C# doesn't to emphasize sparse code as elegance. In SQL sometime it's great to put subquery on same line and sometime it doesn't - it really depends on what you want to emphasize and convey rather than hard and fast rules on number of tokens and syntax analysis. Code formatting is not just set of fixed rules, it's a communication mechanism that guides reader on what to focus, what is unimportant, where is a gasoline spill and where wild fires may burn. This is not to say everyone takes their formatting seriously which is where automate formatter would probably add value (and the case where you are importing/copy pasting from somewhere else). I think K&R C is likely the gold standard for code formatting. You should try out your formatter on those snippets ;).
It's funny, I hold exactly the opposite view for exactly the same reason! I regard a code formatter nowadays as an essential tool for programming productivity, the fourth most important tool after an editor, compiler and web browser. The reason is that the limit on how much I can get done is not so much wall clock time as mental energy. The thing that costs mental energy is making design decisions. Without a code formatter, every few lines provides another invitation to make a design decision about layout.
That's all fine if you are the only person who will be touching that file. The problem is that most of the time a piece of code will have multiple developers working on it over time. Each dev will have a different perspective of what that intent and explanation should be.
I have wasted far too much of my life arguing with people about how code should be formatted. Ideally I would just have a rule set up in source control to format the code on checkin and be done with it.. Then if I want a different style when I come to edit a file I can run whatever formatter I want on it and it won't affect anyone else.
I maintain a source beautifier too [1], and it's not as nice as I would like. One of the issues I run into is that the correct indent on a broken line is context dependent. For example:
while (someReallyReallyReallyReallyLongFunction() &&
anotherLongFunction()) {
loopBody();
}
is a nicer indenting than:
while (someReallyReallyReallyReallyLongFunction() &&
anotherLongFunction()) {
loopBody();
}
In the first case, the two conditions are aligned which makes the code clearer. Does dartfmt handle this? If not, do you have ideas on how it might?
Also, how does it handle invalid input? I may want to reindent my code before it's correct.
Also, did you explore constraint solvers instead of a graph traversal? It seems like they would be a natural fit.
Or if you don't want to make up a good name for your conditional:
while (true) {
if (!someReallyReallyReallyReallyLongFunction()) break;
if (!anotherLongFunction()) break;
loopBody();
}
Or, in a more rules-based fashion:
conditionWithReadableName: function() {
if (!someReallyReallyReallyReallyLongFunction()) return false;
if (!anotherLongFunction()) return false;
return true;
}
...
while (conditionWithReadableName()) {
loopBody();
}
I find it troubling to make up "good" indentation rules when the code to indent isn't well-written in the first place. Multi-line conditionals are an anti-pattern in itself (no matter if they appear in "while", "if" or "for").
I'm not sure I like any of those as much as the original. When you introduce a new name, I don't know if you're going to use it anywhere else. (But having a name is good, so I'm on the fence.)
When all your while conditions are inside the block, I have to actually look at it to figure out that it's really just a standard while loop.
I agree with you in that multi-line conditionals should go, and no beautifier is going to make it up for bad code, but it's still great that they try. You can't always devote as much time as you'd like to make the code perfect, and that is when a beautifier comes the most handy. Also, having a good beautifier when you have to deal with other not-so-perfectionist programmer's code is great.
> Does dartfmt handle this? If not, do you have ideas on how it might?
Yes, it does! Correct indentation based for nested expressions is a vital feature for helping the reader understand the structure of the code.
The basic idea is fairly simple. Any place a line break may appear, you mark it with a number representing how deeply nested in the expression it is. So in code like:
function(outer(inner(first, second), third));
You would get chunks whose expression nesting level is like:
When you line break, you ensure that its nesting level gets assigned a correct indentation level. Deeper nesting means deeper indentation:
function(
outer(
inner(
first,
second),
third));
There are a bunch of interesting edge cases, though. Some times a nesting level doesn't happen on a line break, so it doesn't need to get an indentation level associated with it:
function(outer(inner(
first,
second), third));
Here, levels 1 and 2 don't appear in line breaks, so we give the first indentation level to nesting level 3. There are stranger cases where you may assign a deeper level before a shallower one like:
function(outer(inner(
first,
second),
third));
Shaking out all of the bugs requires a lot of work and a really big test suite.
> Also, how does it handle invalid input? I may want to reindent my code before it's correct.
If the code doesn't parse, it just exits with an error. Being able to run it on incomplete input would be useful, but I decided it was out of scope.
> Also, did you explore constraint solvers instead of a graph traversal?
Good question! I have a little experience with them. I think the line between the two is sort of fuzzy. I didn't approach it directly like a constraint solving problem, but it does have some features in common.
I'm not sure you understood my example. Let's say we have:
while (first && second) {
third
}
We might assign nesting levels like this:
while( 1
first && 2
second) 2
third 2
which could lead to this line breaking:
while (first &&
second) {
third
}
This is bad because 'second' and 'third' are visually aligned, and so one might think that 'second' is in the loop body instead of the condition. We want to indent 'second' more than 'third', even though it has the same nesting level:
while (first &&
second) {
third
}
This is what I meant by "context dependent:" here we have two chunks at the same indentation level but that want different numbers of spaces. Does dartfmt attempt to handle this?
For this sort of things I'm using an intermediate language (guess it's similar to what OP is talking about) with stack semantics: there are instructions like "PUSHINDENT", 'POPINDENT" and "INCINDENT" - with PUSHINDENT inserted after '(' in ifs, function calls, etc. - ensuring that if line breaks before the POP instruction, indentation would reflect the context.
So, the biggest problem is in assigning weights to the break candidate instructions is this stream. This can only be done by a lot of experimenting, I could not find any formal method or a passable heuristic.
On each opening parens put the current line length (=indentation) on a stack. On closing parens pop from the stack. On newline use the top of the stack as indentation.
And programming anything, or taking over the universe for that matter, is just a matter of the hitting the right keys on the keyboard in the right order.
It's so true... I've only been at Google since the end of June and already have written a few proto converters. And teaching the super-newbies (i.e. fresh out of school) the difference between the frontend and backend protobufs...
Huh, I thought the 'Reformat to Dart Style' option in IntelliJ was using dartfmt and I was so disappointed at its output I stopped using it. Just went and tried dartfmt from the command line after reading this - dartfmt is significantly better. Fun to hear the approach that went into it.
For anyone that hasn't tried it, grab the Dart SDK and the IntellIJ Dart plugin. Takes less than 5 minutes to setup. It's been a great platform for building server side stuff - I haven't tried it for front end web stuff. It took about 3 reads of the language tour (https://www.dartlang.org/docs/dart-up-and-running/ch02.html) and about a week and I already felt very comfortable with the entire platform.
The most complex single piece of code I ever wrote was a scheduler. The user could specify a pattern of when events should be raised (eg on this date, at this time, every other hour on the last day of every month, at midnight for me in this TZ on a server in another TZ, etc), and the scheduler would raise the events at the prescribed instant(s).
That took about 9 months, and my biggest takeaway was that how humans measure time is completely f*ed up!
This made me curious as to what the Perl Tidy formatter was like, as I use it often. You know, it's Perl, so maybe a few regexes here and there, and much wizardry.
The Tidy.pm module is 1.1M in size, and over 30,000 lines long. I have much respect for formatters now, I thought the job they do was an easy one.
In the intro to programming course I took at University the final assignment we had was something similar (Format Text) we had to write a program that when given three arguments (a text file to read from, an int specifying the characters per line and an int specifying the lines per page) would output the text file formatted correctly breaking on words etc.
From memory there were other requirements indenting the first word of each paragraph things like that.
As the article alludes to it was a surprisingly complex problem - we also had to worry about memory allocation as we were using C. I remember I was quite proud when I got the sample text (which was a few paragraphs from "The Hobbit") to render correctly.
I've never thought about writing a code formatter I just trust emacs to format my code for me. I'd be interested in digging up my old code and seeing how easily I could modify it to operate on source code.
I think code formatters are a great idea, but they're not quite clever enough for me yet.
For example, if there's a common pattern among a set of lines, I'll often line them up vertically to make the repetition clear and focus attention on the differences rather than the commonalities; for example:
Pretty printing is surprisingly tricky. Last time I had to do it, I ended up with this: https://github.com/rethinkdb/rethinkdb/blob/next/src/pprint/... which has become my favorite algorithm for it as it's quite tweakable for specific needs. Fun little algorithm.
"Amazingly, surprisingly, counterintuitively, the indentation problem is almost totally orthogonal to parsing and syntax validation. I'd never have guessed it. But for indentation you care about totally different things that don't matter at all to parsers. Say you have a JavaScript argument list: it's just (blah, blah, blah): a paren-delimited, comma-separated, possibly empty list of identifiers. Parsing that is pretty easy. But for indentation purposes, that list is rife with possibility!" -- Steve Yegge, 2008 [1]
That really struck me back then, and I've kept it in mind whenever I hear about code beautifying/indenting.
And yet, the best place to add your pretty-printing and indentation hints is parser. Hints are attached to the grammar, so it makes sense to merge the two things, and then generate two different tools out of the single source. Three, actually - an AST pretty-printer, a textual code formatter and, finally, a parser itself.
There's an issue for a CST (AST with whitespace, comments, etc) in the estree repo [1]. JSCS is planning on using https://github.com/mdevils/cst for future autofixing rules.
The beauty of this pretty-printing solution (merging it with the parser) is that you don't even need any parsing tree to be constructed. The parser will simply walk the stream and annotate it with the pretty-printing instructions (pushing an popping the indentation context, adding the weighted break candidates, etc.).
" Even if the output of the formatter isn’t great, it ends those interminable soul-crushing arguments on code reviews about formatting."
Similarly, covering all your food in Doritos dust doesn't always taste great but it ends the interminable soul-crushing arguments about what flavor things should have.
No, nourishment is just the bare essentials of cooking. In fact, we have a very large problem, literally and figuratively, because we often don't care about the nourishment. Consider junk foods and others that we call "empty calories" because they have no real nutritional value.
My point being that I don't really see the persuasive value of the analogy. It's a false-equivalence. The point was that yes many people often care deeply about the formatting of the code (myself included), but discussions around formatting are almost always a form of bike-shedding. A better analogy would be "randomly selecting the restaurant doesn't always lead you to your favorite place, but at least it prevents the interminable discussions about where to go."
I've never worked with a fmt tool, but run C# StyleCop[1] on each build to warn about style violations. Naively to me that seems to give the same benefit, but is probably significantly easier to write, is easily configurable and extensible, and leaves me in control.
Isn't it annoying when a globally optimizing tool switches back and forth between "all arguments on one line" and "all arguments on separate lines"? E.g. producing overly complex whitespace changes in diffs for small "triggering" changes?
Has anyone done any comparisons to other code formatters of other languages? Or even other code formatters within Dart?
Wish I could gain more context on how big an arena of these types of programs. I'm a bit lost as to how important code formatters and beautifiers were until reading more on the difficulty of writing such a program by Mr. Nystrom.
A* implies that you have a heuristic function pointing towards a known destination. It's about knowing where you want to go, but not how to get there.
In this case, we don't know where in the solution space the best solution will be. It's not even that easy to tell if we've found it. So that rules out simple pathfinding algorithms like A*.
I work in a team that uses the eclipse formatter to enforce our style-guide, so I get the value. Having said that, configurable defeats the point of a community-wide formatting tool. From the article:
Even if the output of the formatter isn’t great, it ends those interminable soul-crushing arguments on code reviews about formatting.
With a configurable formatter you just move those arguments into the style-guide discussions and still have disagreements between people and teams using different configured values.
My feeling though is the problem is they have a line limit. Maybe they should rethink their style. I'm serious.
Before I worked at Google, in 30 years of programming I never worked at a company that had a line limit. Adding a line limit at Google did not make me more productive. At first I thought "hey, I guess 80 chars makes side by side comparison easier" but then I thought back, hmm. I never had problems comparing code before when I didn't have a line limit.
Instead what I found was that 80 character limit was a giant waste of time. The article just pointed out a year of wasted time. I can point to searching and replacing an identifier and then having to go manually reformat hundreds of lines of code all because of some arbitrary style guide. I also had code generators at google that had to generate code that followed the line limit. I too wasted days futsing with the generator to break the lines at the correct places all because of some arbitrary line limit.
That should be the real takeaway here. Make sure each rule of your style guide actually serves a purpose or that its supposed benefits outway its costs.