Hacker News new | past | comments | ask | show | jobs | submit login
Tutorial on Good Lisp Programming Style (1993) [pdf] (umd.edu)
121 points by fanf2 on July 24, 2019 | hide | past | favorite | 32 comments



I wish I had internalized this earlier in my career — to write code for human readers first, and compilers second.

I hope someone proves me wrong, but I think you really need a Lisp to do this right. Things always break apart with other languages because they don’t fully decouple from a computer architecture, and architectures change. When expressing ideas, human brains do not operate as Harvard machines (even though certain programming languages will try convince you so).

Lisp does provide the means to limitlessly abstract and combine knowledge. There is no predefined architecture to mold an idea into, so concepts and their connections can get expressed in a natural way.

For example in C-style languages, the means of combination aren’t general at all, e.g. the way to combine a function with a type is very different than the way to combine a function with a number. You’re forced to structure an idea so that a specific compiler version can compile it to machine code. These brittle constructs don’t lend themselves very well to separate ideas from their current implementation. Implementations change, and knowledge get lost.

Lisp code always seems to survive though.

Thanks for posting this tutorial, it’s been a joy to read!

PS. “Almost all Bad Code examples are taken from published books and articles (whose authors should know better, but will remain anonymous)”


> I wish I had internalized this earlier in my career — to write code for human readers first, and compilers second.

Going through Turbo Basic and Turbo Pascal before coming to C helped me with that.

I learned to cleanly express algorithms and only write it for the compiler/hardware if not fast enough yet, and just the specific hotspot.

C-style culture seems to push people into writing for compilers first and people second, including meanigless PR discussions about optimizations where performance is already good enough for the desired goal.


Have you tried Haskell? I have my criticisms of the language, but "too close to the hardware" would be a very surprising way to frame any of them.


  λ length [1..1000] ^ 3
  1000000000 :: Int
  λ length [1..1000] ^ 9
  -6930898827444486144 :: Int


Fair enough. Base has some warts in general.


> to write code for human readers first, and compilers second.

I find this very ironic in this context. After all lisp is a language where the human does half of the job the compiler is doing, because you need to write directly in AST instead of higher level language.


Except that requirement makes working on code as data trivial, which enables proper macros, which lets you raise the level of abstraction you're working on to arbitrary levels, and hide all the machinery supporting it.

So in the end, you end up writing code much closer to the domain (and thus, more human-readable) than you would in a non-Lisp language.


Two points: It's easy to learn to think in ASTs; it just takes practice. No more practice than it takes to get good at any other programming language.

Second: ASTs contain no ambiguity. English contains ambiguity. Javascript contains ambiguity. The full meanings of most programs cannot be discerned from just looking at them; you also need external documentation that describes the conventions in play (e.g. operator precedence).

This is less true with Lisp than with any other language. Lisp novices complain about Lisp being write-only, but a well-written Lisp program written today will still be self-evidently unambiguous (and likely even runnable) in 1000 years. Try that with C++.

Lisp is the Latin of programming languages.


Latin also contains ambiguities, at various levels (lexical, syntactic, &c.), just like any other natural language.

Lisp is, however, rather similar to Montague Grammar-style analyses of natural language, assuming a modern Chomskian binary-branching tree structure.


Then Haskell is the Greek.


> Lisp is the Latin of programming languages.

Fuck I hope not


It's more of a compromise between the humans dealing with language (compiler, editor, DSL authors) and the humans who exclusive want to write application code.

Common Lisp is far from the only option, but it's arguably the best compromise of popularity / feature set. There are a variety of s-expression, stack-based or simple syntax languages that could work.

As a counter-example in a more popular language, look at how much effort went into "function builders" for Swift. They're solving a real pain point, but it took five years to get there, the feature isn't backward compatible to old versions (unlike a lisp macro), and the standards process was minorly subverted to add them.


The idea that there is a man/machine trade off is a fallacy. In fact whatever is hard for the machine is also hard for the human and vice versa. Syntax requiring hard-core parsing techniques is also harder for humans to understand and to parse.

If you're already thinking in abstract syntax, encoding it in concrete syntax is extra work. The more convoluted the concrete syntax, the more work it is. And then consequently to decode it.

Analogy:

"When you write plain text, you're doing the job that the machine should be doing for you because you're writing in the direct decoded output! You should type your data directly in compressed format; then you could take advantage of GNU gzip to decode for you! The gzip format has useful syntactic sugars for reducing verbosity, like indicating that some fragment of a few bytes is to be repeated. And you can further express yourself in fewer bits using convenient Huffman codes and such."

The fallacy is rooted in taking it for granted that the convoluted representation is the more convenient one in which programmers actually think and, furthermore, that the only operation whose cost matters is the conversion of that representation to the abstract one, not vice versa.


One does not write 'directly in AST' in Lisp.

Lisp is written in nested lists of prefix expressions.

Instead of

  foo(bar(1,2));
one writes

  (foo (bar 1 2))
Instead of

  function foo(bar) {
    bar.baz = "hello";
  }
one writes

  (defun foo (bar)
    (setf (slot-value bar 'baz) "hello"))
Instead of

  function(a) {
    a + 1;
  }
one writes

  (lambda (a)
    (+ a 1))
Basically it's writing hierarchical lists of tokens in a externalized data structure: symbolic expressions.

An AST would represent more information. For example a s-expression representation of an AST might look like:

  (anonymous-function-definition
    :argument-list ((variable :name a))
    :body          (function-application (function :name +)
                               :arguments ((variable :name a)
                                           (constant :value 1))))
It would actually represent from parsing what the meaning of the tokens is (function, variable, constant, ...) and what the syntactic constructs are: function definition, function application, argument list, ... -> an AST is the result of a syntax analysis.

see: https://en.wikipedia.org/wiki/Abstract_syntax_tree

The Lisp source representation lacks this information and needs to be parsed/interpreted to construct this information. The Lisp reader does not do a syntactic analysis of a program - it does not even know the difference between program and data. On the Lisp source level a program is just data and (+ a 1), (a + 1) and (a 1 +) are all valid expressions - but only the first is a valid Lisp expression.


> The Lisp source representation lacks this information and needs to be parsed/interpreted to construct this information.

That information doesn't have to go anywhere into the tree though; a compiler can generate code directly from a macro-expanded top-level expression without constructing any class-based concoctions involving inheritance.

The parsed Lisp data structure meets the textbook definition of AST in every regard, particularly after macro-expansion. All the nodes that are compound forms are clearly labeled and classified by symbols which denote the special operators. Recursive walks of the tree can readily branch to cases according to these symbols.

It's just a smarter version of the diagram you see in that Wikipedia page's top diagram.


> That information doesn't have to go anywhere into the tree though; a compiler can generate code directly from a macro-expanded top-level expression without constructing any class-based concoctions involving inheritance.

Sure, but then it does not use an Abstract Syntax Tree.

> Recursive walks of the tree

That's the parsing part. You can only macroexpand and walk the tree with knowledge of the Lisp language: special forms, calling forms, arg lists, environments, ... etc.

> All the nodes that are compound forms are clearly labeled and classified by symbols which denote the special operators.

  (let ((a a)) (flet ((a (a) a))
    (a a))
Nothing is labelled: without walking the tree based on the Lisp syntax you won't know what the 'a' symbols are - there is zero syntactic information represented. In the s-expression, they are symbols and not elements in a syntax tree.


Note that "nothing is labeled" in the node of an AST that is written in C or Pascal; fields of a record or structure are accessed by position. The fields have compile-time names only. I don't see anything in the Wikipedia article about things having to be labeled.

If we obtain a pointer into the middle of an AST node without any context, we also don't know anything about it; we have to work with a correctly aimed pointer to the base of the object.

There is no reason anything in the compiler or interpreter would obtain a pointer to just the (a a), and treat it as a form, other than as some sort of severe bug. The (let ...) thing is an object, and has to be respected as such.

Basically your argument boils down to "lists are not proper polymorphic tree node data structures, even with discriminating type symbol in the car; and an AST must be made out of structured data with named fields."


> Basically your argument boils down to

No, my argument boils down to that an Abstract Syntax Tree must represent Abstract Syntax. A stream/list/tree of tokens isn't a tree of syntax elements.

In Lisp (+ a b) and (a b +) are valid token lists and we have no idea what these tokens are other than symbols. In an abstract syntax tree we would represent syntactic categories like operation, variable, constant, function call, etc. and it would be the result of a parse of a particular syntax.

The Lisp reader is not a Lisp parser. It doesn't know if (+ a b), ((a) (+) (b)), (a (b) +) or (a b +) are valid Lisp constructs.

The Lisp reader is an S-Expression parser. Its input is a s-expression and the syntax of s-expressions. Thus the result of its parse can't be an abstract syntax tree for a Lisp program - it simply can't assign any syntactic information: neither implicit, nor explicit.

Definitions (stolen):

Abstract Syntax is a model consisting of the language constructs from which a language is composed.

Abstract Syntax Tree is a directed acyclic graph (DAG) consisting of instances of Abstract Syntax language constructs.


Because Lisp uses a flexible representation, the representation has to be validated to see that it conforms to a given syntax (such as that of Lisp source). This can happen naturally in a code-expanding tree walk. The subsequent structure can then be processed without additional validation; every form has the right kinds of pieces in the right positions. If (1 a) or ((a) b) are not valid forms, their presence can be diagnosed at that stage. Subsequent processing stages, can trust that the pieces are where they are supposed to be. The invalid forms like (1 a) and ((a) b) are assumed not to occur, just like any bad inputs to a module that have been checked away.

You're arguing for a definition of AST that basically coincides with "decorated AST": AST plus some semantic attributes, and that a generalized tree form that encodes supersets of a syntax cannot be an AST.


> This can happen naturally in a code-expanding tree walk

just because something can be interpreted as a tree, it's not suddenly an Abstract Syntax Tree. It's mainly nothing more than a hierarchical tokenizer format.

A flexibility of the Lisp source representation comes because it is NOT an AST and thus not bound to a particular correctly represented syntax. A macro for example can turn any wild combination of tokens into more wild combination of tokens - regardless of what the symbols/numbers/lists/strings, ... actually might 'mean'.


I just took the time to read through the entire style guide. I have been using CL since 1982 and still learned useful tips to keep in mind. CL really is an ageless language and is excellent for one person projects because it facilitates both getting things working quickly and a style of building small libraries of reusable code so future projects go faster.


Those are still good tips, irrespective of language and even the document's age.


I agree with you, though I think a few things maybe don't hold anymore:

> Sign and date your comments!

> (Should be an editor command to do this)

Now that everyone uses a VCS (well, almost everyone), this is no longer necessary.


Yea. I know VCS are supposed to be a better option and handle all of that for you.

Our software vendor uses VCS, but has a long history of signing and dating code sections with paragraph explanations. They've been developing the same codebase since the 70's and release the uncompiled source along with the precompiled binaries. Their documentation is horrendous, but the source and comments help out quite a bit.

Some of the prolific coders have written lots of research papers and textbooks as well, so it is really cool to read one of their textbooks and see the code firsthand.


It's still a good idea. Hell, I recently configured my editor to expand "todo"/ into "TODO | -- $name, $date" (where | is where the caret ends after expansion).

Code often outlives its VCS history. I've seen it in every company (except one) I worked for. Signing and dating comments lets you evaluate whether the remark is still applicable and who to bug about regardless of how deeply the VCS goes, and doesn't require you to jump through multiple Blame sessions to filter out commits that only touched or moved, but otherwise not changed the comment.


Todo’s are probably the only place where signing and dating the code are not horrible. That way you can filter out all the random todos by intern Jim last summer.

But I have seen that people who put their name in comments are too lazy/ignorant to know how to use blame. It’s super easy and keeps your code clean. If the author is older then the VCS, they are no longer relevant.


> Todo’s are probably the only place where signing and dating the code are not horrible.

Not just todos. Also notes bringing in external context to why some code looks the way it is, because that external context can invalidate over time, and a date is then useful to evaluate whether the code needs changes.

> If the author is older then the VCS, they are no longer relevant.

You'd be surprised. Loss of VCS history happens for various reasons. In my career, I've seen one because the company switched to a different VCS. I've seen another because someone forked the project by copying out some parts of older project and committing them in a first commit of a new repo. I also worked with a codebase that's older than I am, predating Git, SVN and CVS, that someone at some point open-sourced on Github.


Might still be a good idea, actually. "I've checked that this comment is up-to-date as of ..." is not something that a VCS surfaces.


Does anyone know the Lisp macro tutorial by Allan Wechsler that is referenced on page 81?


This is from (1993)


and then


The initial submission lacked the tag in the title.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: