Hacker News new | past | comments | ask | show | jobs | submit login
Literate programming: presenting code in human order (johndcook.com)
95 points by cab1729 on July 7, 2016 | hide | past | favorite | 73 comments



Peter Norvig's comment hits the nail on the head for me:

" Peter Norvig 6 July 2016 at 11:47 I think the problem with Literate Programming is that assumes there is a single best order of presentation of the explanation. I agree that the order imposed by the compiler is not always best, but different readers have different purposes. You don’t read documentation like a novel, cover to cover. You read the parts that you need for the task(s) you want to do now. What would be ideal is a tool to help construct such paths for each reader, just-in-time; not a tool that makes the author choose a single pth for all readers."

Has anyone attempted something like this?

I've heard you can do transclusion in org-mode which might be a starting point

Edit: Some initial ideas:

- Code can be deconstructed into blocks, and where the code is not self-documenting, prose can be added. Or even visualisations etc if a block is conceptually tricky to grok. You could even have MOOC-style validations to verify reader understanding for each block.

- Some kind of topology of the blocks should be generated based on how they interact and how they are conceptually related

- The author creates a few 'starting points' for different audiences, e.g. 'if you've used X before, start at Y'

- From there next blocks to read are auto-suggested to the user. A map or network diagram of all blocks is also provided so the reader can chart their progress and see where the 'big ideas' lie.

Edit 2: This also really reminds me of Bret Victor's 'Humane Representation of Thought' lecture (https://vimeo.com/115154289, 45:40) where he says there is a conflict between code being an engineering specification and an authored work meant to be read by humans.


I don't understand why there not being a single best order of presentation for explanation is grounds to dismiss the whole endeavour. I bet that there are many pretty good orders for humans that are an improvement over the order the compiler wants the stuff in. Besides, the compiler doesn't demand a single best order, most programs could be re-ordered in a number of ways acceptable to the compiler (my point being that this criticism seems to hold literate programming to a standard that the incumbent of compiler-required ordering is not held to.)

Edit: added the parenthetical comment


Yeah I agree. What I would imagine a tool like this would do is help to find and navigate the pretty good order(s) whether they have been hand crafted at first by the author or not.

A comparison would be how desire paths form and are often turned into real paths.

I like your parenthetical observation too!


There was an interview with Guy Steele in the book Coders at Work, and in one bit he had something to say about actually referring to the (literate) source code for TeX [1]:

Sometimes I've got a specific goal because I'm trying to solve a problem. There have been exactly two times, I think, that I was not able to fix a bug in my TeX macros by reading The TeXbook and it was necessary to go ahead and read TeX: The Program to find out exactly how a feature worked. In each case I was able to find my answer in 15 minutes because TeX: The Program is so well documented and cross-referenced. That, in itself, is an eye-opener--the fact that a program can be so organized and so documented, so indexed, that you can find something quickly.

The other thing I learned from it is how a master programmer organizes data structures, how he organizes the code so as to make it easier to read. Knuth carefully laid out TeX: The Program so you could almost read it as a novel or something. You could read it in a linear pass. You'd probably want to do some jumping back and forth as you encountered various things. Of course, it was an enormous amount of work on his part, which is why very few programs have been done that way.

[1] link to the page (337) in Google Books https://books.google.ca/books?id=2kMIqdfyT8kC&pg=PA337


That's a great anecdote and makes we want to read TeX: The Program.

I really think there's something in the 'jumping back and forth' of the reader that could be leveraged, like if there was a way of recording, distributing, aggregating, analysing it etc.


It doesn't add that much to the above, but the now defunct Bookpool had a column where they asked various authors for lists of their favourite 10 computer books, and they asked Guy Steele [1]. His first entry was:

Computers & Typesetting, Volumes A-E Boxed Set by Donald E. Knuth -- I'll read anything that Don Knuth writes, and he has written quite a bit, including Surreal Numbers and 3:16 Bible Texts Illuminated as well as the famous Art of Computer Programming series. But this five-volume set is my favorite. The two volumes titled (and sold separately) as The TeXbook and The METAFONT book are well-known, but what I really recommend to you are TeX: The Program and METAFONT: The Program, because these are simply the best-written, best-documented, best-debugged programs of their size ever published. They reward careful study.

I've tried to read the source for TeX: The Program (I actually have the boxed set he mentions), but was impeded by my lack of knowledge of TeX. I keep meaning to get back to it.

[1] Guy's list in archive.org: http://web.archive.org/web/20080422183814/http://www.bookpoo...


I use a layer-based approach described at http://akkartik.name/post/wart-layers in https://github.com/akkartik/mu. It's 20kLoC of literate code, and I was super concerned about optimizing it for non-linear reading, because that's how I need to browse the code for myself. It's not a system to generate multiple views like you want; instead, it's a single view that I find easy to skim and easy to reorganize at will:

a) Each layer puts more important stuff up top. So you can start skimming from the first layer, but you don't have to read all the way down each layer, just get a sense of what it provides and why.

b) There's not much emphasis on lengthy comments (which would get linear). Instead just the order in which code is presented does a lot of heavy lifting.

c) The tangled code is intended to be readable (unlike Knuth's original tools), so I often jump down into it when I feel the need. Error messages and debugger support do show lines in the original literate sources, however.


Fascinating, thanks for sharing.

Tangential, but reading your website ("I'm working on ways to better convey the global structure of programs"), I've been talking a lot with a visually impaired researcher who works in Auditory Display. We've been talking about designing interactive audio displays to offer overviews of things like webpages and code.

I'll definitely keep your work as a reference if that turns into a project!


I'd love that! I couldn't find an email for you but if you want to ever get in touch my address is on my profile.


> Has anyone attempted something like this?

Yes, I write all my personal projects in human order. To me human order is when the flow control of human consumption and computer execution are most closely aligned. I attempt to achieve this with depth and order.

1) Have a giant library that defines your library or application.

2) Inside the giant function nest child functions for the primary tasks in the order with which they will execute. In the case of the following example the options evaluation is first, followed by the lexer/parser is the second child task, followed by various code presentation tasks, and finally by an analysis task. http://prettydiff.com/lib/jspretty.js

3) Break the child tasks down, as necessary, into reusable components.

4) I also believe a reference should always be declared before it used, which can dictate the order of reference declaration.

My opinion is that when the flow control of the application is unclear you are wasting time during maintenance. You may not know where a problem is occurring, but if the flow of the application is immediately clear from reading the code you immediately know where to start, where to step to next, and where to stop. There is minimal guessing and it doesn't require breakpoints to figure it out.


My hobby project, with dependency graph visualized:

https://e8vm.io/e8vm

Source code:

https://github.com/e8vm/e8vm


That is so cool, thanks for sharing.

Is it possible to use the vis tool on other codebases?


The layout engine is in this package:

https://godoc.org/e8vm.io/e8vm/dagvis

it is possible to layout any DAG (directed graph with no circles).

The dependency graph of golang programs are generated by this package:

https://godoc.org/e8vm.io/tools/godep

I have not very well packaged the drawing part (sorry), but the essentials are here:

https://github.com/e8vm/shanhu/blob/master/web/js/dag.js

it is using d3.


Super stuff, look forward to diving in deeper!

Are you planning to extend this aspect of the project? What are your other plans for the project?


Literate programming is a thing that's long piqued my interest, but it usually seems more suited to academic or personal use.

Having said that, a beautiful example of LP in the wild is geom[0], a Clojure/Clojurescript library written entirely in org-mode. Browsing through the source, I have to keep reminding myself that I'm looking at the code and not just the documentation.

[0] https://github.com/thi-ng/geom


I see that this is a honorable goal, but in this particular project the source looks more like source than documentation. An extreme example is this part, where the code even contains function docstrings, rather than having the docstring as part of the surrounding document:

https://github.com/thi-ng/geom/blob/master/geom-core/src/mat...

This is an effect which I observed with my own literate programming attempts, too: They quickly end up with large documents that contain lots of code listings and not much explaination.


Author here: Agreed & most parts of this particular project aren't really the best example. With other (newer) projects I embraced the LP style much more, and in fact by now really often first write the prose parts to get my head clear about a certain aspect and only then start writing code, e.g.

https://github.com/thi-ng/fabric/blob/master/fabric-facts/sr... https://github.com/thi-ng/geom/blob/develop/src/viz/core.org

As for the docstrings included in the core, this is largely to address complains from other users, who are used to have docs available in the REPL and I often don't have the resources/time to provide both org-mode and docstrings...


I have wanted to try this for so long. Doing my own short articles is one thing, but to see something as complex as 2D/3D voxel libraries in Clj/Cljs is very inspiring.

My only concern for such things is what to do with non-org loving, neigh, Emacs-hating users?

This can be generalized to all literate programming tools. I assume beyond the humble, it kills adoption.

The maintainer of Axiom did a good talk that got me into my penultimate LP kick, but this is always thing missing from such impassioned talks for it.

UPDATE: Axiom, dear 616c, not Maxima. Someone even links it in another thread to boot.


It would help if editors stopped treating comments as second-class content. Most atom themes, for example, render it as I would render unused code (if I were to commit such atrocities) – grey on (light|dark) grey.

I wish there were a functioning plugin to render markdown inline. I bet people would start caring about comments just because they looked better.


One thing I've been doing for a few years now is minimizing syntax highlighting for code, and using some of the colors I save to have different kinds of comments with different colors: http://i.imgur.com/vU783Xo.png

Here's the initial idea that led to this: http://akkartik.name/post/2012-11-24-18-10-36-soc. As you can see, it was in turn spawned by.. a comment on HN :)


Language or "human order" is pretty cumbersome to express ideas. It is full of double meanings, assumption and expectations.

Any attempt to express complicated and precise ideas in natural language results in hard to understand gibberish. And you will need special training to decipher it anyway. Law is good example.

I think programming languages as we have them are good for instructing computers.


Programming languages were designed for people, compilers are designed to interpret that language into instructions for the computer.

Natural language is horrible for a programming language, but don't forget that the reason we write code in higher level languages is specifically to make it easier for humans, not for computers.


You'll still use code to talk to your computer. The human readable prose is for the "why?" (and often more important: "why not?"), the code is for the "how?".

See http://www.literateprogramming.com/adventure.pdf as an example from Literate Programming's biggest proponent.


My latest attempt at this problem is a tool I call `cq` [1]. It's a way to query code with a selector, rather than line numbers. I use it for blog posts to pull code chunks into markdown.

It lets you write a (static) document that mixes prose and code in a way thats best for the reader. But 1. the code is runnable on disk and 2. you don't have to copy and paste.

Generally I find that literate programming solutions don't have good support for hiding portions of the code. This tends to limit the tool's applicability to trivial code examples.

I use `cq` to extract key code blocks from complete projects and then provide the whole project alongside the post.

[1]: https://github.com/fullstackio/cq


It seems to me that well-documented code should follow some sort of power law:

Each line should be documented (best if self-documented).

Each function (of around 10 lines) should be documented.

Each group of functions (of around 100 lines total) - approximately a class - should be documented.

Each module (of around 1000 lines) should be documented.

.. etc. up to the final documentation about the whole program.

Each level of documentation should summarize the purpose, inputs, outputs, assumptions and architecture of the thing being described. So a higher level should be around 10x smaller part of the total documentation (of course, some other factor could be used).


There was a fantastic set of panel discussions at MIT on dynamic languages in 2001. In the Q&A period of the panel on runtime [1] Guy Steele asked the panel what they thought about literate programming [2], and some good discussion ensued. Some points made that I particularly liked were the point that people generally don't even make the effort to keep comments up-to-date, much less writing them well in the first place, never mind thinking about a good order for the human to read them in.

[1] https://www.youtube.com/watch?v=4LG-RtcSYUQ

[2] https://www.youtube.com/watch?v=4LG-RtcSYUQ#t=1h21m43s (discussion lasts about five minutes, participants include David Moon, Scott McKay, and Guy Steele)


Org mode is very good, if not the best, way to do literate programming. My blog post about it: https://kozikow.com/2016/05/21/very-powerful-data-analysis-e.... I use it at work for data analysis and "researchy" projects.

Many of my blog posts at kozikow.com are written in the literate programming style in org mode, and you can see the org files at https://github.com/kozikow/kozikow-blog.

People have written whole books in it: https://github.com/jkitchin/pycse.


The first comment, written by none other than Peter Norvig, is also worth reading.

EDIT: link - http://www.johndcook.com/blog/2016/07/06/literate-programmin...


His comment sums up exactly my thoughts wrt literate programming. It sounds like a nice idea, but in practice it falls far short.

The worst program I've ever had to maintain was written in funnelweb. It was a large, complicated program, so the funnelweb-generated documentation was tens of thousands of pages long. Nobody was ever going to read that. Trying to find out what you needed to modify to fix a bug was impossible from that document. Most developers I know looked at the woven C code (after all, that's where the line numbers in the stack trace pointed to), and worked from there.


We have tools that let you examine code from various angles. There is no need to encode some "human order" in the code itself. Code should be organized in the way that harmonizes with the module structure and promotes maintainability, without regard for some "human order" nonsense.

Who decides what is "human order"? Humans want things in different order for different purposes. For example, business software generates all sorts of reports of different kinds from the same data.

This is wrong:

> Traditional source code, no matter how heavily commented, is presented in the order dictated by the compiler.

This is only true of one-pass, strict definition-before-use, single-module-only programming languages, like toy versions of Pascal.

In any decent language, we can take exactly the same program, and permute the order of its elements with a great deal of liberty, and present them in that order to the compiler. We can decide which functions go into which modules, and we can have those functions in different orders regardless of what calls what.

Ah, but some of the more crazy proponents of literate programming are not satisfied with that granularity. It bothers them that the individual statements of a program that are to be executed in sequence have to be presented to the compiler in that sequence: S1 ; S2.

Knuth's outlandish version, in particular, stretches the meaning of "literate" by turning code into a dog's breakfast in which functions are chopped into blocks. For compilation, these blocks are re-assembled by the "literate" processor into functions.

The result is difficult to understand. Yes, the nice explanations and presentation order may present something which makes sense. But here is the rub: I don't just want to follow the presentation of a program, I want to understand it for myself and convince myself that it is correct. For that, I need to ignore all the text, which, for all I know, only expresses the author's wishful belief about the code.


I understood the concept of literate programming of a way to organize _source code_ in such a way that it 'flows' better for a human, i.e. it starts with the methods/functions/modules/blocks where you naturally would start if you were to explain your code and then on to the next method/function/module/block that you would explain. Perhaps some comments here and there to explain where to start and what are special points of interest. I should have read the wikipedia article I suppose where it seems to be actual prose... :) Now I am not sure I like the idea - always found well-written source code to be easier maintainable and understandable than prose...


Look at Knuth's Adventure example: http://www.literateprogramming.com/adventure.pdf

Plenty of code with your prose there.


I hero-worship Knuth as much as anyone, but that document is frankly terrible.

Take a look at section 6. There's a flowery sentence that says no more than "Add word to vocabulary" (completely pointless if this were a method called "add_word" on a class named "Vocabulary") and then a dense chunk of completely unexplained code.


I think the issue is treating Knuth's literate programming examples as received wisdom, or the pinnacle of the form, rather than as a pioneering effort by someone very talented, but done without the benefit of a developed culture around the practice. He makes mistakes we have memes now for (e.g. "explain why not how", useless comments for imports). I think if literate programming takes off, we'll see the form surpassing Knuth quickly


The problem with most programming languages is that programmers write programs in it by incrementally adding to the code in random places.

I think we need a programming language that allows programmers to build programs by merely adding things to the bottom of the source file.


That's a REPL which can save its state to a file :)


The best way to document a program is to prove its correctness. Short of that, it is usually no fun to deal with other peoples (or your own old ones) programs anyway.

As for presenting things in the right order: This order would largely be imposed by a correctness proof. As there could be different proofs, there could be different orders.

Obviously, not all programs can be proven correct (but I think a very large part of all programs could be). Also, finding the right order is not as difficult as finding the right level of abstraction. Once you have the right level of abstraction for presenting something, everything else flows from that.


I wrote this before, a previous time the subject came up. people seemed to find it useful then, so maybe they will now as well:

A previous employer (a subdivision of a global top ten defence company) used literate programming.

The project I worked on was a decade-long piece for a consortium of defence departments from various countries. We wrote in objective-C, targeting Windows and Linux. All code was written in a noweb-style markup, such that a top level of a code section would look something like this:

    <<Initialise hardware>>
    <<Establish networking>>
and so on, and each of those variously break out into smaller chunks

    <<Fetch next data packet>>
    <<Decode data packet>>
    <<Store information from data packet>>
    <<Create new message based on new information>>
The layout of the chunks often ended up matching functions in the source code and other such code constructs, but that wasn't by design; the intention of the chunks was to tell a sensible story of design for the human to understand. Some groups of chunks would get commentary, discussing at a high level the design that they were meeting.

Ultimately, the actual code of a bottom-level chunk would be written with accompanying text commentary. Commentary, though, not like the kind of comments you put inside the code. These were sections of proper prose going above each chunk (at the bottom level, chunks were pretty small and modular). They would be more a discussion of the purpose of this section of the code, with some design (and sometimes diagrams) bundled with it. When the text was munged, a beautiful pdf document containing all the code and all the commentary laid out in a sensible order was created for humans to read, and the source code was also created for the compiler to eat. The only time anyone looked directly at the source code was to check that the munging was working properly, and when debugging; there was no point working directly on a source code file, of course, because the next time you munged the literate text the source code would be newly written from that.

It worked. It worked well. But it demanded discipline. Code reviews were essential (and mandatory), but every code review was thus as much a design review as a code review, and the text and diagrams were being reviewed as much as the design; it wasn't enough to just write good code - the text had to make it easy for someone fresh to it to understand the design and layout of the code.

The chunks helped a lot. If you had a chunk you'd called <<Initialise hardware>>, that's all you'd put in it. There was no sneaking not-quite-relevant code in. The top-level design was easy to see in how the chunks were laid out. If you found that you couldn't quite fit what was needed into something, the design needed revisiting.

It forced us to keep things clean, modular and simple. It meant doing everything took longer the first time, but at the point of actually writing the code, the coder had a really good picture of exactly what it had to do and exactly where it fitted in to the grander scheme. There was little revisiting or rewriting, and usually the first version written was the last version written. It also made debugging a lot easier.

Over the four years I was working there, we made a number of deliveries to the customers for testing and integration, and as I recall they never found a single bug (which is not to say it was bug free, but they never did anything with it that we hadn't planned for and tested). The testing was likewise very solid and very thorough (tests were rightly based on the requirements and the interfaces as designed), but I like to think that the literate programming style enforced a high quality of code (and it certainly meant that the code did meet the design, which did meet the requirements).

Of course, we did have the massive advantage that the requirements were set clearly, in advance, and if they changed it was slowly and with plenty of warning. If you've not worked with requirements like that, you might be surprised just how solid you can make the code when you know before touching the keyboard for the first time exactly what the finished product is meant to do.

Why don't I see it elsewhere? I suspect lots of people have simply never considered coding in a literate style - never knew it existed.

If forces a change to how a lot of people code. Big design, up front. Many projects, especially small projects (by which I mean less than a year from initial ideas to having something in the hands of customers) in which the final product simply isn't known in advance (and thus any design is expected to change, a lot, quickly) are probably not suited - the extra drag literate programming would put on it would lengthen the time of iterative periods.

It required a lot of discipline, at lots of levels. It goes against the still popular narrative of some genius coder banging out something as fast as he can think it. Every change beyond the trivial has to be reviewed, and reviewed properly. All our reviews were done on the printed PDFs, marked up with pen. Front sheets stapled to them, listing code comments which the coder either dealt with or, in discussion, they agreed with the reviewer that the comment would be withdrawn. A really good days' work might be a half-dozen code reviews for some other coders, and touching your own keyboard only to print out the PDFs. Programmers who gathered a reputation for doing really good thorough reviews with good comments and the ability to critique people's code without offending anyone's precious sensibilities (we've all met them; people who seem to lose their sense of objectivity completely when it comes to their own code) were in demand, and it was a valued and recognised skill (being an ace at code reviews should be something we all want to put on our CVs, but I suspect a lot of employers basically never see it there) - I have definitely worked in some places in which, if a coder isn't typing, they're seen as not working, so management would have to be properly on board. I don't think literate programming is incompatible with the original agile manifesto, but I think it wouldn't survive in what that seems to have turned into.


This seems a lot like cucumber, which is basically a regex expander until it gets down to real code eventually.

I have found this style very frustrating when doing anything complex that's also reused a lot. Particularly because the flow of information (expressed in normal code as variables and arguments) can become very subtle and tightly coupled when it's buried at the very bottom of an expanded expression. Maybe this is a solved problem in the system you're taking about, but I have a really hard time seeing how.


GP talked about this occurring at a defense contractor. The specifications tend(ed) to be much tighter. System X will receive input from this data bus. It will use this wire protocol. It will use that address to push output to <other component>. Messages will be of the following forms.

It's often a lot of embedded systems, and the good thing about these is that this level of specification is quite feasible. I've written a "simulator" (not full-blown, it was a stop-gap measure) for a piece of code we're supposed to get from another contractor. My code pretty much started off this way (but LP isn't liked here and it got butchered, started off in org-mode). I had to receive certain messages reading from shared memory (old architecture, can't change it without changing everything else), and respond back with messages stored in other address ranges. So I described the messages, I described the addresses, including references to specifications. My code was, essentially, traceable back to the requirements and specifications documents.

In this world we often have an SDD, usually not well developed and unfortunately sometimes made after the fact. Literate programming is a way that you can pair up the design and the code into one place, along with a rationale for the code by way of tying it back to the requirements documents.


"This seems a lot like cucumber, which is basically a regex expander until it gets down to real code eventually."

It's not at all like that. I must have explained it badly. Cucumber is written for a machine. This was written for humans. Not shown; design discussion, diagrams, links to requirements, pieces of history, and everything else that was helpful for humans to understand the software from requirements to design to implementation.


Ah, re-reading I think I see where I went wrong. So code is written separately, but cross linked with these document identifiers?


Broadly speaking, yes. The programmer might have written something like this:

    The principal components (message queue, network connector, message handler) are all necessary. They are initialised together, and if any one fails, the program should be aborted.
    
    <<Initialise message queue>>
    <<Initialise network connector>>
    <<Initialise message handler>>

    Each initialisation signals success or failure independently, but a global "Error state" object exists; any catastrophic error found will trigger an abort of program with logging.

    <<Check catastrophic error state>>

    The message queue feeds itself directly from the network. It needs only be made aware of the connector. This was not done during initialisation for reasons x, y, z.

    << etc etc >>

Somewhere else might be a piece that looks like this:

    The message queue initialisation is very simple; it's using a library standard queue. See meeting minutes 3452/r for further detail.

    <<Initialise message queue>>=
    // ACTUAL CODE


The document contains various markup explaining the order of the code, and the order of the human readable sections; upon munging, the human gets a beautiful PDF that intersperses code with design discussion in a good order for the human to read and understand (and consequently review and test, so we've got ourselves a virtuous cycle here), and the machine gets just the code. For the machine, each << XXX >> pieces gets replaced with other << YYY >>, recursively, until it's just code.

Because it's munged, there can be a chapter discussing everything there is to know about a single class, for example, and a chapter discussing System Initialisation, or whatever else fits to the needs at hand. If someone wanted to know about initialising the system, they PDF had the broad top-level discussion, and a break down of smaller sections. The reader could then dig into each as deep as they liked, at each stage seeing the design discussion/diagram, and as much of the code detail as they liked.

It made working on code I'd never seen before an absolute dream. By the time I reached the actual code, I already knew what it was meant to do and how it worked.

On the one hand, a lot of extra work, and it does constrain the coding style you can use; it only makes sense with code that can actually be broken into smaller, meaningful pieces like that. On the other hand, zero bugs found by our customer, the prime contractor. QA and testing and everything else was a massive part, but it all helped. I know the prime contractor was definitely doing some serious testing, because we watched them sue one of the other contractors for general incompetence.


This is a very valuable example, thank you very much for sharing.


Your example looks like a sequence of procedure calls that interact implicitly via global data. Yuck.


Tell you what, replace the bits you don't like with something involving objects, or functional programming, or whatever your particular favourite is. It's just an example and the bit you've focussed on is the least meaningful part. Of everything there, it's the most trivial and the part that could be radically changed and make no difference at all.


I used to think that was necessarily bad, but I take a less dogmatic view of such things now. You might find this post by John Cormac (of Quake fame) of interest: John Cormac on Inlined Code [1]. I now think such an approach has its place in some domains, and this might be one of them.

[1] http://number-none.com/blow/john_carmack_on_inlined_code.htm...


Regarding the tangle barrier issue, there is a tool that helps: https://github.com/aryx/syncweb It allows to modify both the original WEB document and code and keep them in sync.


I am a fan of the literate programming style for personal projects.

For a very well executed and interactive example check out

http://dave.kinkead.com.au/modelling-the-boundary-problem/


My main problem with literate programming is that it is difficult enough to keep short comments up to date vis-a-vis the code they describe. Longer comments would be even worse.

I also suppose the modern equivalent of this, in many use cases, is a Jupyter notebook.


My main problem with literate programming is that it is difficult enough to keep short comments up to date vis-a-vis the code they describe. Longer comments would be even worse.

Perhaps counter-intuitively, I have found the opposite to be the case so far. It’s all too easy to overlook a couple of small comments out of 20 when you update a screenful of code. You can’t really miss a whole paragraph of text that separates a few lines of code you’re working on from everything else.

I’ve also found that with literate documentation the emphasis is naturally on explaining the big picture and why things are being done, and that kind of information tends to remain relevant through minor code changes anyway.

You can still add short comments about particular points in the code as well, and maintaining those is much the same whether you’re using literate programming or not.


Ah, but in literate programming, the challenge is to make sure the code matches the narrative, not the other way around.

> I also suppose the modern equivalent of this, in many use cases, is a Jupyter notebook.

Yes - and no. I think when you're writing a document (or working up to a document) - such notebooks will naturally tend toward lp. This our data, here are our cleanup procedures. This can be visualised like so. Thus we can conclude... Etc.

I wonder if lp would catch more on if there was more prose classes given to cs students? afaik most "professional" writers will do a top-down like design; lay out plot elements, fill in chapters. It's how you can write a good book in less than a year. I've seen many authors say that their first book was really slow going, but then as they get better at the craft of writing, they write quicker and better.

I've always found the "top down" design for small programs to be intuitive. They are often: setup, get parameters/configuration, read data/do stuff, clean up.

I think forcing oneself to write a sentence or two for each (no code!), sets one up for success with the first draft. A complement to BDD/TDD in a way. Similar to how explaining something to someone else is a great way to learn it better.


Yes, if it seems to require a lot of prose, I think you're doing it wrong. See my comment at https://news.ycombinator.com/item?id=12051069 for how it can be far more powerful than a notebook.


The F# Formatting library lets you do things like re-order code and hide less important code for documentation purposes. You can create clean documents presenting code as a narrative. http://tpetricek.github.io/FSharp.Formatting/literate.html


I’ve been using Literate Haskell for a project recently. It’s a heavily mathematical data-crunching algorithm, in the region of a few thousand lines of executable code. This is the first time I’ve tried writing literate code in a major professional project, and it’s been interesting to see how the reality matched up with my initial expectations.

To me, the biggest practical difference is whether your documentation is primarily written as a tutorial or for reference on demand. Presenting ideas in a natural order for tutorial purposes is useful to someone coming to the code for the first time, or if you’re coming back to look at a module a while after you first wrote it when all those little details are no longer so familiar.

In my case, the documentation is generated using LaTeX and so can also include maths, diagrams, tables and other illustrative material right there next to the associated code, as well as providing a natural place to put module- or program-wide summary information to give an overview of how everything fits together. Like a mathematical paper, it takes a little effort to present all this extra documentation well. However, it’s hard to overstate how much better it is if you’re trying to understand some intricate mathematical code that you wrote three months ago and the actual maths is right there and is then directly reflected in the shape of the code.

Others have mentioned that literate code might be harder to maintain in the long run. I’m not sure how realistic that really is, based on my experience so far. If you’re making code changes significant enough that you’d want to reorder the whole presentation, you’re probably rewriting significant chunks of that documentation anyway, and it’s not as if our editing tools can’t cope with moving code and/or text around.

What does suffer, significantly in my experience, is the scannability of the code. Those few thousand lines of Haskell I mentioned produce well over 100 pages of typeset documentation at this point. That’s partly because of the extensive textual notes and mathematics and diagrams and so on. It’s also partly because typeset documentation is naturally more spaced out because of things like headings and blank lines. But the fact remains, if I looked at just the source code in my usual editor, I’d probably have 50–100 lines visible in a single window, and I can open several of those windows at once on a big screen. If I’m looking through the literate documentation (or the source file from which it is generated) then I am probably only seeing one third to one half of that at most, and crucially, that code only appears a few related lines at a time, often a single function or a small family of related type definitions. Since Haskell itself is rather uniform in appearance and tends to be written by composing very short elements anyway, this makes finding and understanding individual fragments of code noticeably harder than regular coding when you want to refer back to something in isolation.

So far, I’m finding that a price worth paying, at least for this kind of heavily mathematical work with me as the sole developer. In practice, I don’t actually want to refer to a small code fragment on its own very often. I’m more likely to come back to a whole module, skim the entire literate documentation for it (probably just a few pages) to remind myself of how it all fits together, and then not need to jump around understanding small individual elements in isolation. Still, there is definitely a cost here, and it definitely affects how I read and understand the code as I’m working with it later. I suspect some of that cost would be incurred anyway by using Haskell, or any other language and programming style that emphasize composing many small elements, but using literate programming does exaggerate the effect, and so far I’ve found that to be its biggest drawback over more conventional styles.


There's a (somewhat rambling, but interesting) talk on the effort to rewrite the lisp symbolic math system Axiom[a] as a literate program (spanning several books) - precisely in order to make it possible to refactor and maintain the code:

"Literate Programming in the Large": https://m.youtube.com/watch?v=Av0PQDVTP4A

[a] http://axiom-developer.org/


That's very interesting.

I wonder whether a `show me the code only' view would be useful?

I have written some short literate Haskell pieces, but nothing more than executable blog posts.


I wonder whether a `show me the code only' view would be useful?

What I personally miss most is the ability to look at code from different perspectives and navigate it in different ways.

I find writing in a literate style is much more like preparing an academic paper or formal presentation than like programming as I usually would. There’s a clear order of doing things, but there’s only one order, a very static form of presentation and reading. As I mentioned before, I find this can work quite well when the work actually is heavily mathematical and relies on careful and systematic construction of the final result, but like working with math papers, it’s definitely an “acquired taste” and takes some getting used to.

In contrast, when I’m programming in most other languages, I rapidly navigate all over the code to follow relationships and definitions and so on. We have lots of tools to help do that quickly and easily in any modern programmer’s editor or IDE. We also have lots of ways to display different parts of the code and visualisations of the relationships between them simultaneously. I really miss that sort of dynamic, flexible working environment with the Literate Haskell. Sadly, I’ve yet to discover any programming environment that supports both the documentation side (essentially a good editor for working with LaTeX and the related tools) and the programming side (more like an interactive IDE).

To be fair, this feeling is probably due in part to my own relative inexperience with Haskell projects on this scale. Although I’ve long been interested in functional programming and used it for various bits and pieces over the years, my large-scale, professional projects have generally used more mainstream languages like C++, Python and JavaScript, and the tools and environments that go with them. Functional programming has a rather different feel anyway, and perhaps I just haven’t learned to combine that more mathematical/functional mindset, literate programming style, and the available tools as well as I could yet.

In any case, I definitely believe there’s a lot of potential for new tools in this area, combining the kind of documentation and structured presentation I’m seeing with the literate code with the kind of dynamic, on-demand exploration of code that modern IDEs for many other languages offer.


I've been using Haskell a lot (even professionally for 5 years). None of my commercial Haskell coding was literate, though. That has been restricted to smaller explanatory pieces.

We do jump around in the programming editor when doing Haskell. Eg jump to definition is just as useful as for other languages.


May I ask what you use for editing your Haskell code?

For the literate project I’m using my usual programmer’s editor. It handles switching between the LaTeX and Haskell reasonably well in terms of syntax highlighting, but it does lack most IDE-like features, even with any of the extra packages I’ve tried installing so far.

A tool that provided reasonably reliable go-to-definition functionality would certainly be a helpful addition, and the kinds of pop-up help you get in IDEs to keep track of function parameters and their types seem particularly relevant to a language like Haskell, but I’ve yet to find an environment that both offers those features and handles the documentation aspects well.


I've never tried the LaTeX literate Haskell in earnest, only the > kind.

I've been using Emacs, Vim to edit Haskell, and at Standard Chartered Visual Studio to edit their Haskell dialect.

Go-to-definition and display-type-of-expression-at-cursor can be done in Vim and Emacs already.


Is the Knuth book on literate programming any good?

Anyway I thought this article would have been better with a few examples. I'm still not any closer to understanding how the tangling process works.


I've read Knuth's article on this and here's my thoughts. It's meant to be used with a single language (Pascal), which is sometimes good (because the tool really understands Pascal and can parse these code snippets to produce a cross-reference), but it's also a curse, because many of the tool's constructs are actually workarounds for Pascal shortcomings. Also, it only generates a single file, but today's projects are both multi-file and multi-language. So it's a bit dated.

I think the idea itself is so simple it doesn't really need any extensive book. Basically you take any text preparation system and add three elements: a code fragment, a reference to a code fragment, and a file, which is a fragment that will be written to disk.

(I myself use such a system based on reStructured Text. I want to post it on Github when it's ready, but it's not there yet, so I cannot show exactly how it works. But I use it to write a moderately complex personal project with C and Python mostly, and all the tooling, e.g. Makefiles. I found it very helpful; it does make the code much cleaner, that is it strongly urges you to. And it's invaluable when you want to remember why you did things that way.)


Isn't Knuth's CWEB used for C?


CWEB is for C and the original WEB was for Pascal. (C is a good fit, by the way, because you can use #line directive to make the compiler to report the line numbers in your original literate files instead of the irrelevant line numbers in generated files.) But as far as I remember it still produces a single C file and a single TeX file.


When experimenting with NOWEB and Java (1.3 I believe) I discovered that one of the things that those LP systems did was encourage something like macro based code re-use, somewhat "lifting" the rather basic language up towards something a little more high level. This was good and bad; bad like unhygienic macros can be bad.

As far as I could figure out it was a poor match for OO/"inheritance oriented" programming. It was a better match for procedural style programming - "writing C in java" - then the LP "macros" didn't confuse things. But when mixing OO-style java, I found things seemed to become uncomfortably verbose.

NOWEB might in some ways be the worst of both worlds - it knows (and cares) nothing about the programming language - but that also means blocks can't be parameterized. This is annoying if you're using variables to hold iterators and state (traversing a graph stored in arrays as edge/node lists etc). You end up with dangerous variable name re-use or code duplication (and too big code blocks).


Yes, the Knuth book is good, but note that what it is is a collection of papers, most of which are not really about literate programming.

Some of them predate literate programming as such; e.g., there's a classic called "Structured programming with go to statements".

There's the original article from 1984 that introduced literate programming.

There are two "Programming Pearls" columns from CACM; the first is another introduction to literate programming, and the second is a not-entirely-trivial program for counting common words in a file -- followed by a review by Doug McIlroy which is very well worth reading in its own right.

There are extracts from the TeX and METAFONT (literate) source code.

There's the classic "The Errors of TeX" paper, as well as the whole TeX error log up to 1991.

(And a few other things, but I think the ones I listed above are the most interesting.)


Have a look at http://www.literateprogramming.com/adventure.pdf for another example. (I think the source-document that the pdf is made from is also available.)


It is interesting although the examples are a bit convoluted.

However, once you read it you understand that LP is probably only useful for single-person projects (because each programmer "thinks" each problem in his own way).


Literate programming - the original IPython notebook


"I think I understand better now why literate programming hasn’t gained much of an audience. I used to think that it was because developers hate writing prose. That’s part of it."

Right, and that's why it's so difficult to find developer-written blogs and articles and tutorials online....


Paraphrasing Knuth, the intersection of good writers and good programmers is small. That doesn't stop people like me from making attempts at writing blogs and articles, but it doesn't make them good. It also doesn't mean that people won't write one-off articles and blogs, but may not be interested in doing that essentially daily (what the literate style kind of requires).


Your point is taken, I just take issue with the article perpetuating such a stereotype. I'm a developer, I have a degree in literature and write a ton. And the article says nothing about developers not writing good prose, just not liking to write prose period. The intersection between anything and good writers is pretty slim, to be honest. The author just supposes that developers don't like writing prose. Which is silly. At least in my opinion.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: