Grammars for Programming Languages (2018)

Joker_vD · on Jan 21, 2020

I manually wrote more recursive-descent parsers that I am comfortable to admit (hey, they're fun to write), and most of the "beyond the context-free!" approaches described in this article seem to me to fall into two categories:

1. There is this one common pattern in recursive-descent parsers that allow you to write a more concise recognizing code — let's make it directly expressible! Ordering rules, "not an ending quote/not an end of a comment", "ident that's not a keyword" fall into this category.

2. Semantic actions work really well in practice, but it's gauche. Let's instead encode some restricted, non-Turing-complete, but still useful PL directly in grammar, that'll improve things somehow! Detecting "undeclared identifier used" and "identifier redeclaration" falls into this category: this "quite knotty" Boolean grammar has a dict-like structure threaded through it, isn't it amazing? And this grammar-with-context has built-in support for scoped environments, you don't need your symbol tables anymore (unless you have user-defined types, presumably)!

Of course you can parse the input according using any of these more expressive grammars in the same time and space as you can with a boring old CFG grammar with rule priorities and semantic actions: because that's essentially what they're, just expressed differently. And is there much gained by expressing it differently? It's the same story as with DSLs, really, you keep making them more and more complex and feature-full up until the point you arrive at a full-fledged programming language with awkward semantics and realize that you probably should have stayed with limited DSL that could invoke arbitrary actions written in a proper general-purpose programming languages.

So, yes, "Devising an adequate and convenient formalism on top of grammars with contexts is a challenging task", and it's the task that must be solved before any of those approaches would be more useful than what we have today.

carapace · on Jan 21, 2020

See also Definite Clause Grammars (DCG):

https://en.wikipedia.org/wiki/Definite_clause_grammar

https://www.metalevel.at/prolog/dcg

drallison · on Jan 21, 2020

There is a tension between the language (grammar) used to describe a language and the algorithm that is used to parse the language. In practice, a Pratt Parser (https://en.wikipedia.org/wiki/Pratt_parser) is often a good choice since it has the simplicity of a recursive descent parser but avoids the recursive plunge for objects like expressions.

snidane · on Jan 21, 2020

Controversial opinion but I believe that eventually these grammars and complex parsers will be found to have been a huge mistake for computer science. A mistake made for the sole reason of making programming languages resemble natural languages, even though they are not meant to be read fluently.

Everyone who experienced the magic of lisp understands how beneficial it is to have the textual representation as close to the parsed abstract syntax tree. One can create new languages fit for the purpose of a given class of problems and thus reducing size of codebase 10 to 100 fold (or even million times in case of modern multi million LOC software projects).

The biggest mistake lisp ever made was the unintuitive parenthesized prefix notation. Which can be thrown away by having all operations strictly infix.

Ie. instead of doing combinations of the following:

  lisp: (plus 1 2)
  methods: 1.plus(2)
  algol,C: plus(1, 2)
  haskell: plus 1 2

One can use the simplest form ever and achieve even simpler lisp:

  1 + 2

but use it consistently for every function call or operator within the language. (even for functions of multiple arguments).

APL came close, but for some reason decided to ignore the important Phoenician discoveries of 1. reading from left to right 2. use of phonetic alphabet instead of hyeroglyphs.

jcranmer · on Jan 21, 2020

> the important Phoenician discoveries of 1. reading from left to right 2. use of phonetic alphabet instead of hyeroglyphs.

The Phoenician alphabet was written right-to-left or boustrophedon (alternating left-to-right and right-to-left, usually mirroring glyphs when writing them backwards). The Greeks appear to have started with boustrophedon writing and then settled with left-to-right writing. The other Semitic scripts that borrowed from Phoenician (e.g., Hebrew and Arabic) copied Phoenician's right-to-left writing direction.

While we're being pedantic here, the Phoenicians also didn't invent an alphabet. They used an abjad, not recording any vowels. The Greeks were the first to create a true alphabet, adapting 'aleph (representing the glottal stop, which Greek did not have) into alpha (representing the open front unrounded vowel 'a'), he into epsilon, and 'ayin into omicron. Phoenician semivowels waw and yodh furnished upsilon and iota, respectively.

A final point is that the principle of using glyphs to represent phonetic values does not originate even with Phoenician. The rebus principle was already present in both cuneiform and hieroglyphics, and Middle Egyptian already developed a full inventory of consonant glyphs, although these were not used exclusively.

agumonkey · on Jan 21, 2020

I use to dislike linguistics, but you made me change my mind. Any book suggestion on early language development ?

analbumcover · on Jan 21, 2020

> 1. reading from left to right

This seems a strange quibble when it is an arbitrary convention that doesn't really matter in the slightest. It's just the natural extension of the usual mathematical function composition f(g(x)) to include dyadic functions and thus do away with function/operator precedence. It doesn't make things any easier or harder if you were to prefer generalizing ((x)g)f instead so that the language could be read left to right, it's just a different convention.

> 2. use of phonetic alphabet instead of hyeroglyphs.

The purpose of the symbols in APL is that they are meant to be a "tool for thought" [0]. But I'm a mathematician so I was already trained to this way of thinking. Using more verbose languages becomes an exercise in tedium, they make programming feel like trying to chop down a tree with a herring. The "hieroglyphs" are a large part of what make APL feel like wielding a lightsaber by comparison.

[0] http://www.eecg.toronto.edu/~jzhu/csc326/readings/iverson.pd...

AnimalMuppet · on Jan 21, 2020

> One can create new languages fit for the purpose of a given class of problems and thus reducing size of codebase 10 to 100 fold (or even million times in case of modern multi million LOC software projects).

I find that a bit hard to believe. Here's a 100 million line codebase; I seriously doubt that you can express it in 100 lines in any language. Lisp may be good, but it isn't that good.

chubot · on Jan 21, 2020

Yeah to be honest one of the things that made me skeptical of the code compression / productivity claim is looking at the implementations of Chez Scheme and Racket (after also looking at 20+ compilers / interpreters, and working on a language for a few years).

I'm pointing to them as very long-lived and valuable codebases written in Lisp dialects. Chez Scheme is a 35 year old codebase and Racket is also decades old.

So I'm not saying there's anything wrong with them, but I am saying that it doesn't appear that they're 10x easier to understand or modify than LLVM or CPython (Chez being a compiler and Racket being an interpreter as far as I remember). Or that you can get a 10x better result.

Basically for the claim to be true, why can't you write something like Racket in Racket 10x faster? Like 3 years instead of 30 years. And why doesn't it do way better things than CPython or Ruby? Those claims might be "slightly" true depending on who you are, but they're not devastatingly true. There's been more than enough time to evaluate the claims empirically.

In other words they would have already proven themselves in the market if that were the case. You would have extraordinarily productive teams using these languages -- along the lines of what PG hypothesized 15+ years ago.

http://www.paulgraham.com/avg.html

In fact the thing I found was interesting is that at the core of Racket is a big pile of C code, just like CPython. A year or 2 ago I watched a talk about them self-hosting more, and moving to Chez scheme's backend, but I don't recall the details now.

https://github.com/cisco/ChezScheme

https://en.wikipedia.org/wiki/Chez_Scheme

https://github.com/racket/racket/tree/master/racket/src/rack...

(FWIW I also looked at and hacked on femtolisp around the same time, since I was impressed by how Julia uses it.)

correction: it looks like Racket has a JIT too, written in C. Still same point applies: it's not magic and looks a lot like similar codebases in C. Chez is more self hosted AFAIR but it's also hundreds of thousands of lines of code.

carapace · on Jan 21, 2020

> In other words they would have already proven themselves in the market if that were the case. You would have extraordinarily productive teams using these languages -- along the lines of what PG hypothesized 15+ years ago.

There's K and Kdb, et. al. (I think they're on to "Q" now.)

vidarh · on Jan 21, 2020

I'd argue very little of what makes K and Kdb special has to do with the language per se, but with the focus on array operations, and with the particular runtime environment of Kdb (heavily focused on distributed services that logs (and can replay) operations against a columnar database with most operations in-memory).

Q to me is a tacit admission that the obtuse syntax of K is not necessary to get the benefits of K, and that K presents too steep a learning curve for most.

They're fascinating, and I think vastly underrated, but they're hard to grasp even without the syntax hurdle...

anonu · on Jan 21, 2020

Kdb benefits from a combo of things:

1. The terse language is actually quite expressive and very well suited to vector programming. So less code is more. Also gets you thinking in a particular way...

2. The database, a mix of transactional, in memory, and fast analytics (aggregations, joins..) concepts

3. The runtime framework: Kdb is an ideal building block for a network distributed system. Opening sockets and connecting to other Kdb instances is simple.

vidarh · on Jan 21, 2020

2/3 I buy, but for 1 I think you need to decouple the syntax from the underlying array-programming model. K-level terseness and the semantic structure that makes K well suited to vector programming are pretty much entirely orthogonal.

You can implement the core of K's array manipulation in other languages and ends up with similarly terse code if that's what you want. But you can also produce something that will be far more readable to most people without losing the expressiveness by e.g. opting for naming some of the operations etc.

carapace · on Jan 21, 2020

> I'd argue very little of what makes K and Kdb special has to do with the language per se, but with the focus on array operations, and with the particular runtime environment of Kdb

I'm also not convinced that K as a language is essential to kdb as an application but it's undeniable that they have "proven themselves in the market", eh? It's a niche, but a well-paid one, and they don't have any competitors (with e.g. easier syntax.)

Are you familiar with arcfide's Co-dfns compiler? https://github.com/Co-dfns/Co-dfns I ask because it might serve as a counter-example of a "killer app" that relies on the concision of the syntax.

> I think you need to decouple the syntax from the underlying array-programming model. K-level terseness and the semantic structure that makes K well suited to vector programming are pretty much entirely orthogonal.

FWIW, I think so too. You could say that things like Numpy and Pandas are moving towards integrated array processing, eh?

vidarh · on Jan 22, 2020

They have proven themselves in the market, but it's impossible to decouple the language agnostic benefits of KDB from K's syntax in that respect. That said, they have plenty of competitors - just most of them are just run of the mill languages.

beagle3 · on Jan 21, 2020

But it can't easily (and perhaps essentially) be decoupled and still be usable.

At first, the math notation for "x ^ n" is cryptic, and writing "x * x * ... * x repeated n times" is clearer. But then, you grok exponentation, and everything longer than "x^n" becomes too verbose and needlessly complex; furthermore, once you've grokked it, "x^n" naturally extends to negative, rational and real n naturally, whereas the expanded form does not.

K is similar; At first, it seems needlessly terse, weird and opaque. But once you've grokked it, it's clear that it's much more than the underlying array model (which, as you pointed out, can be and has been implemented in other environments).

As supporting evidence, I bring numpy; It has the same kind of array operations, and as-high-a-level of language; and yet, code in numpy written by most people is still an order of magnitude longer (and slower) than code produced by people writing K. There is a selection bias here (Everyone uses python, only K compatible people use K), but I think it's more.

Look at an example from [0] - the complete function (executable by the 300KB entire K runtime, independent of any library, and with named arguments which makes it 50% longer than some people would write it):

    {[t;f;o]{x y z x}/[_n;o;t f]}

This is a K idiom known afaik as "converging multicolumn select"[1]. "t" is a table; "f" is a [list of] columns, e.g. `price`size ; "o" is a [list-of] subset-and-ordering functions, one for each column.

To collect all the rows from table "sales", where time is more than 9am, price is less than 100, and ordered by decreasing size, you would call it with the arguments [sales;`time`price`size;9h00<,100>,>:]

Although it is possible to write such a simple, generic, useful idiom in Python/numpy, I have personally never seen it done. R's data.table (and its Python port) come close in a limited subset of cases, but the extreme usefulness of converging multicolumn select does not extend; also notice that the call has exactly the specification, no boilerplate or even keyword (and just minimal punctuation). This example is not unique. e.g. "flatten" is ,// ; Yes, that's the entire function implementatiion, and no, it's not special cased.

K optimizes the use of your eyes and brain as a pattern matching device. It has been my experience that, much like good math notation (e.g. Leibniz's differential notation), good programming notation makes things easier and evident in a way that is not captured by equivalent semantics that aren't human compatible in the same sense.

[0] http://nsl.com/k/t.k (and http://nsl.com/ in general

[1] although it should probably be called "reducing multicolumn select"

vidarh · on Jan 22, 2020

> But it can't easily (and perhaps essentially) be decoupled and still be usable.

I'd argue it's not usable to most people in its current form at all, which is why they remain tiny niche languages.

> At first, the math notation for "x ^ n" is cryptic, and writing "x * x * ... * x repeated n times" is clearer. But then, you grok exponentation, and everything longer than "x^n" becomes too verbose and needlessly complex;

This is contrived. The argument is not for writing out like that. Now of course x^n is well understood by most, but many languages do offer longer versions, such as e.g. math.pow(x,n). For x^n I don't think that is necessary, but for less common operations I'd much rather have a word that hints at a concept than a symbol that does not.

For most us remembering a large number of single-character operators slows us down, rather than speed us up.

> Although it is possible to write such a simple, generic, useful idiom in Python/numpy, I have personally never seen it done.

I don't use either, but e.g. considering Ruby, the main reason why you wouldn't tends to be the lack of the equivalent of a "table" type. With a table type that provides suitable methods, such as e.g. provided by typical Ruby ORMs, you might end up with something like this:

    ->(t,f,o) { f.zip(o).inject(t.select(*f)) {|ds,a| a.last.call(ds,a.first)} }

Of course we wouldn't write it like that, because it is unreadable.

I think this is typical of an attitude difference. I don't see the value in your example, because to me it is semantic overhead that requires me to slow down when reading code.

> also notice that the call has exactly the specification, no boilerplate or even keyword (and just minimal punctuation).

This is part of the negative aspect of K syntax to me, not a positive.

> K optimizes the use of your eyes and brain as a pattern matching device. It has been my experience that, much like good math notation (e.g. Leibniz's differential notation), good programming notation makes things easier and evident in a way that is not captured by equivalent semantics that aren't human compatible in the same sense.

My experience is that there is a sharp dividing line between people who are happy reading maths, who tends to be ok with syntax like that, and people who rely on visual layout to recognize code who tends to absolutely hate it, and judging by which programming languages are successful, the latter group seems to be substantially bigger. I'm glad you're happy with a syntax like that.

But to me it is not useful, because I can't skim it and quickly get an idea what it does.

chubot · on Jan 22, 2020

I feel like that's a counterexample to the OP's claims? AFAICT he was saying that homoiconicity in Lisp enables DSLs which enables code compression.

But those languages aren't homoiconic and I would probably call them DSLs themselves, not as general purpose as Lisp, Python, or C.

carapace · on Jan 26, 2020

Mmm... I'm not convinced. What about arcfide's Co-dfns compiler? https://github.com/Co-dfns/Co-dfns

It kinda counts as homoiconic, eh?

lmm · on Jan 21, 2020

> Basically for the claim to be true, why can't you write something like Racket in Racket 10x faster? Like 3 years instead of 30 years. And why doesn't it do way better things than CPython or Ruby?

Maybe it offers the same level of functionality with 10x less contributors/funding/...?

kragen · on Jan 21, 2020

I think it's plausible that writing code in C, Pascal, or languages of a similar level is several times more productive than writing code in assembly language, and writing code in Python, Ruby, JS, Scheme, Lua, or other languages of a similar level is several times more productive than writing code in C.

This is not true for all programs, for a couple of different reasons.

First, the reason writing code in Python is often more efficient than writing code in C is that it allows you to leave things out. Specifically, you can leave out resource lifetimes, specifying low-level types (representation in bits), checking high-level types (conformance to protocols), memory layout, and error handling. The CPython interpreter will improvise something for you in all of these cases. It won't be as good as what you could have come up with if you were doing it yourself in C, but for many programs, other things are more important. A lot of times you need to have some idea about those things in order for your program to work properly, but it can be informal, incomplete, and tacit, rather than formal, complete, and explicit. But for some programs, these things that are being left out are too important to leave out. In those cases using JS instead of C isn't saving you anything.

Second, as Alan Kay says, architecture trumps materials at scale. In the taxonomy of SICP, programming languages contain primitive concepts, means of abstracting concepts, and means of combining concepts, and the ways in which a higher-level programming language is more efficient than a lower-level programming language are mostly in the form of primitive concepts. Python has auto-growing lists and dicts ready-made, plus polymorphic method calls with duck typing and garbage collection and whatnot. These reduce your code size by some constant factor, and they also nudge you toward using a certain architecture, an OOP architecture which is more scalable in some ways. Maybe one line of Python does as much as 64 lines of assembly language. But your software's architecture defines its order of growth: does its functionality increase proportional to the amount of effort put into writing code, proportional to its square, or proportional to its square root? And you can implement any architecture in any programming language.

I do think there's something to the claim that Chez Scheme represents a substantially better language implementation than CPython on some axes despite being the work of a much smaller group of people. Chez is mostly Dybvig, actually, though Scheme itself contains important ideas from dozens of people. By contrast, /usr/share/doc/python2.7/ACKS.gz on my system lists almost 1500 people, including, to my surprise, me.

chubot · on Jan 21, 2020

BTW I remember your "memory models" post from a few years ago, and I found it insightful! It got me thinking about many related things.

http://canonical.org/~kragen/memory-models/

Around that time I'd noticed that the stringly-typed languages of shell, awk, and make have no garbage collection. They're value oriented and not reference oriented. The goal of Oil was basically to "upgrade" shell into something more like Python, and on the surface that looks straightforward, but the reference model vs. value model makes it pretty different.

I've gone back and forth on that... whether I should just slap GC into a shell, or whether I should try to preserve the value model while making it less stringly-typed and more expressive.

I have a more ambitious idea for a language where every value is essentially a pair (Go-like string slice, structured data), but I probably won't get to that in the near term. For a concrete example, I noticed from writing some HTML processors that both DOM and SAX APIs have a bunch of flaws for processing semi-structured text.

-----

Also, on the other point, let's consider these two possibilities about Chez / Dybvig:

1) Dybvig is an average programmer who got "superpowers" from Lisp, i.e. got a 10x boost in productivity.

2) Dybvig and Chez are an outlier, e.g. like Fabrice Bellard and his recent QuickJS (and other projects), or Lattner and LLVM/Clang/Swift. The salient point is that these projects have nothing to do with Lisp.

It's obviously not an either/or thing, but I'd say the answer is closer to #2. The complexity is from the problem domain, and many major software projects are minor miracles that only a few people are qualified for, regardless of language.

kragen · on Jan 26, 2020

I'm delighted that you enjoyed that note! You might enjoy parts of Dercuano, too, then.

As I understand it, shell, awk, and Tcl are more or less linear languages, except that copying and destruction are implicit. It's fairly straightforward to make a linear language that includes more expressive types than just strings; Tcl 8 even did it without breaking backward-compatibility. Linearity is pretty incompatible with the OO worldview in which objects have identity and mutable state, but to the extent that you provide Clojure-like FP-persistent data structures whose operations merely return new states, maybe the OO worldview can just go suck it. Rust takes a different tack in which copying is optionally explicit (though destruction still implicit: "affine typing" rather than "linear typing") and you can use "borrowed references" to avoid the headache of explicitly returning all the argument objects you didn't destruct.

A nice thing about variations like Rust's and Clojure's is that you can preserve the expressiveness of the Lisp object-graph memory model without all of its drawbacks: no aliasing bugs, no circular references complicating the life of the GC, and in Rust's case, no garbage collector at all.

> I have a more ambitious idea for a language where every value is essentially a pair (Go-like string slice, structured data), but I probably won't get to that in the near term. For a concrete example, I noticed from writing some HTML processors that both DOM and SAX APIs have a bunch of flaws for processing semi-structured text

Right, text markup isn't tree-structured, and I think HTML5 actually prescribes fixups for <i>some <b>incorrectly</i> nested</b> markup. The GNU Emacs/XEmacs schism was largely about how to handle this problem for text editing; the XEmacs side added "extents" to which you could attach structured data (such as a font or a link destination), which are more or less the type of values you're describing, while the GNU Emacs side instead added "text properties" which conceptually applied independently to each of the characters of buffers or strings, but under the covers were of course optimized with an extent-like representation. Neither side of the schism considered replacing text buffers with S-expression-like trees like the DOM; that was Microsoft's fault :)

Are you thinking that when you concatenate a couple of such string slices, the operation will be O(1) because it uses ropes behind the scenes? Or are you thinking of not having concatenation at all as a primitive operation, instead using an unboundedly-growing scratch buffer (not, itself, a value) that you can append them both to and then take slices of? Is the text in the slices mutable, and if so, can it also grow and shrink? I think there's a large design space of interesting things you could do.

About Dybvig and Chez, sure, Kent Dybvig is a wizard. But so is Guido van Rossum, whatever embarrassing errors he may have made regarding first-class blocks and tail-call optimization in Python. I haven't asked Dybvig, but I don't think he's a Bellard-class wizard, so I don't think that's the answer.

Or are you saying that Python's problem domain is much harder than Chez Scheme's domain?

By the way, my apologies for having delayed so long in responding to your thoughtful notes.

chubot · on Jan 27, 2020

There are a lot of interesting things to pull on here, but I'll reply first with a comment I just wrote that linked back to this thread.

https://news.ycombinator.com/item?id=22157587

A tentative slogan is "bijections that aren't really bijections", i.e. to describe the "fallacy" of (bytes on disk -> data structure in memory -> bytes).

Hence the (slice, data structure) representation.

That's the correctness aspect. I also think there's a big performance aspect, e.g. it appears to me that the performance of many/most parsers is dominated by object allocation.

Some vaguely related stuff here: https://github.com/oilshell/oil/wiki/Compact-AST-Representat...

Anyway I don't think this will make it into Oil any time in the next year, but I think there is room for it in programming languages, and lots of other people are barking up that tree (e.g. FlatBuffers relate to a big part of it). I would be happy to continue batting around ideas in non-nested HN thread :) So as mentioned elsewhere, feel free to send me a mail or ping me on Zulip!

I'm looking at Dercuano but I'm confused why it's a tarball and not a live website :)

kragen · on Jan 27, 2020

> I'm looking at Dercuano but I'm confused why it's a tarball and not a live website :)

So you can keep reading it after I'm dead.

chubot · on Jan 21, 2020

Yes I totally agree -- C is way more productive than assembly, and dynamic languages are way more productive than C. I think those are incontrovertible facts that have been proved over and over again in the market, according to the test in my post.

I'm arguing against the OP who is essentially claiming that the homoiconity and metaprogramming/DSLs of Lisp are a huge advantage over say Python.

I would argue the opposite: all the essential innovations of Lisp made its way into other languages like Python and Ruby (and Julia, which has a Lisp-like macro system by way of its femtolisp front end [1]). It was a hugely influential language but it's not ahead of the state of the art right now.

Another way of saying it: PARSING is the only difference between Lisp metaprogramming and metaprogramming in any other language, and parsing isn't an essential difference. It's a bit of friction. (Though I want to improve parsing, long story ... [2] )

-----

In fact I've done what you could consider a software engineering experiment along these lines in the last few years.

I wrote a bash-compatible shell, running thousands of lines of unmodified bash scripts, in Python. It's 5-7x fewer lines of code than bash, due to metaprogramming with Python, ASDL, and re2c. I claim it's significantly faster to develop and has fewer bugs because of this. It's very very different than how bash is written.

Line counts: http://www.oilshell.org/blog/2019/06/17.html#why-is-it-writt...

Well it took over 3 years, but bash has been around for 30, and I also enhanced the language in many ways. I won't argue if someone says the jury's still out, but here is a significant achievement on performance:

http://www.oilshell.org/blog/2020/01/parser-benchmarks.html

Despite being written in Python, and parsing more thoroughly by design, the parser (which is a ~10K line program in Python, and more in bash) is faster than bash's once it's automatically translated to C++. So basically the high level "DSL" of Python or "OPy" retains enough information to make it better than the state of the art.

Here's a comment on this non-Lisp heavily DSL-based architecture from a few months ago:

https://news.ycombinator.com/item?id=20799926

To summarize it, if you look at how bash is implemented, it's 140K lines of "groveling through backslashes and braces one at a time in C", as I like to call it. And it has about as bugs / mistakes as you'd expect with that style, including some serious security problems.

So you can use DSLs to "compress" that code -- and I claim make it more correct and faster (jury is still out on memory usage). And you don't need Lisp to do it. Python was a fine choice for metaprogramming and DSLs; Lisp doesn't have any advantage.

In fact in January 2016 I hacked on femtolisp specifically because I was thinking about borrowing Julia's implementation approach for Oil. But yeah it's not enough of a win to justify the downsides. If someone wants to prove me wrong, write a bash-compatible shell in Lisp in under 3 years ;)

It's not going to be any easier, just like writing an x86 backend isn't much easier in Lisp. The difficulty is from the problem domain and not the language. And I noticed that all compilers use metaprogramming and DSLs anyway -- LLVM has hundreds of thousands of lines of TableGen. The Go compiler uses a Lisp-like DSL to express SSA transformations.

So basically Lisp is great but it's been thoroughly absorbed, and PG basically extended a falsifiable argument that was proven false. All of YC's biggest successes use dynamic languages, not Lisp. Stripe uses a ton of Rails, AirBNB is Rails, Dropbox is Python, etc.

https://www.ycombinator.com/topcompanies/

Instagram and YouTube are Python, etc. Wikipedia and Facebook are PHP, etc. I think that constitutes enough evidence.

-----

edit: I should also add/clarify that I've met programmers who I'm sure can bang out the equivalent of bash's 140K lines of C in (much better) C in a relatively short period. I'm not saying the way I did it is the only way to do it. Rather, I'm saying Lisp doesn't have any fundamental advantage in metaprogramming. The only difference is parsing, which is a small bit of friction.

The other point which people have brought up many times is that most people are bad DSL designers. It may be better to rely on "known" DSLs than to invent your own for every program. That can be a drag on productivity negating any of the potential benefits of smaller code.

[1] https://docs.julialang.org/en/v1/manual/metaprogramming/

[2] This was supposed to be the start of that argument http://www.oilshell.org/blog/tags.html?tag=parsing-is-diffic...

kragen · on Jan 26, 2020

> dynamic languages are way more productive than C.

I meant to say that high-level languages were more productive than C; I think there are a lot of people who will be happy to tell you that they are more productive in Haskell than in C, as much so as in Python, even though Haskell is a lot more static than C. (I don't know Haskell well enough to say; the only Hindley–Milner language I know is OCaml, and I feel less productive in OCaml than in C.)

> I'm arguing against the OP who is essentially claiming that the homoiconity and metaprogramming/DSLs of Lisp are a huge advantage over say Python. ¶ I would argue the opposite: all the essential innovations of Lisp made its way into other languages like Python and Ruby

I pretty much agree. Lisp was a unique language in, say, 1990, when it had garbage collection, dynamic typing, dynamically expanding data structures, higher-order functions, eval, easy serialization and deserialization of data structures, an object-graph memory model that made it easy to represent things like parse trees, and easy metaprogramming through homoiconicity. Meanwhile, the other popular programming languages were things like C, Pascal, BASIC, FORTRAN, sh, Lotus 1-2-3 formulas, dBASE, and COBOL, which had none of those; a lot of them didn't even have recursion. Today, though, Python has all of them except for homoiconicity, and we can do quite a bit of metaprogramming in Python, using its pervasive polymorphism and higher-order programming rather than macros. Most other popular modern languages like Perl 5, Ruby, JS, Lua, and PHP 5+ are basically dialects of Lisp. Even languages that differ dramatically from Lisp, like OCaml, Kotlin, C#, Swift, and recent versions of Java are pretty close.

> I wrote a bash-compatible shell, running thousands of lines of unmodified bash scripts, in Python. It's 5-7x fewer lines of code than bash, due to metaprogramming with Python, ASDL, and re2c. I claim it's significantly faster to develop and has fewer bugs because of this. It's very very different than how bash is written.

That's a very impressive achievement! Does it use more RAM than bash does? I'd think that one of the potential benefits of C over Python is that, because C requires you to specify your memory layout, C programs tend to use significantly less memory than otherwise equivalent Python, usually by about an order of magnitude.

That is, some of the order-of-magnitude complexity reduction you got in your experiment is surely due to Python, but perhaps some of it is due to RAM costing about US$100 per megabyte when Brian Fox started writing bash in 1988.

> Python was a fine choice for metaprogramming and DSLs; Lisp doesn't have any advantage.

I think that may be overstating the case; no doubt there are some things that are easier in Python and some things that are easier in Lisp. SBCL or Racket can usually beat CPython or Jython by about an order of magnitude on speed, for example, and having first-class lambdas really is an improvement over Python's botched mixture of context managers, function decorators, and generator abuse. In the other direction, Python's syntax is significantly more readable — especially for arithmetic — and SBCL and Racket don't have anything like Numpy, and their own compilers are not an adequate replacement. But the difference between Lisp and Python, whichever one it favors, certainly isn't close to the enormous advantage Lisp or Python has over Pascal.

(And I think Clojure is really the most interesting Lisp, actually.)

> I should also add/clarify that I've met programmers who I'm sure can bang out the equivalent of bash's 140K lines of C in (much better) C in a relatively short period

I would be very surprised about that; but Andrew Appel did come close to winning the ICFP programming contest one year using that well-known functional programming language, C. And a few years later someone won using C++.

On the other hand, you did end up writing OSH in OPy rather than, strictly speaking, in Python. The initial Python versions sort of ended up being a prototype, in a way. And you could presumably have written OPy in some other language. Maybe it would even have been easier to write it in OCaml or F# or Haskell. (But if the language it compiled were less compatible with Python, I guess you would have had to rewrite a lot of the Python standard library.)

So maybe that's what "much better C" would look like in the context of bash: a smallish core compiler or interpreter for the DSL in which the rest of the shell is written?

specialist · on Jan 21, 2020

I'm a bit more open to the possibility. I'd like @snidane to share examples, real or imagined.

I'm barely qualified to discuss such things, so please humor me:

Alphabets, syntax, grammars, semantics, vocabulary, mental models, transformers, etc... There's a dance, right? Working at the right level of abstraction. Changing something here and figuring out the impact over there.

Switching from roman to arabic numerals. Small tweaks to combo of digits, representation, operations yielded a huge leap.

I once replaced an huge stack for designing print production plans (ScenicSoft's Preps, had 80% of the prepress imposition market at one time) with a simple form. My central insight was identifying and modeling the physical system as a series of transformation steps (how the paper was fed, folded, cut).

Optimist me hopes there's many more of these clever hacks to be discovered.

agumonkey · on Jan 21, 2020

I know few but the best effort/research on the subject I know of is VPRI OS written in Ometa (it has a page on wikipedia).

I think they made the whole thing in 100kLoC using dedicated DSLs (compared to Windows which is in the 100M ballpark). I think Alan Kay wanted at 10kLoC system.. who knows maybe one day. :)

nmadden · on Jan 21, 2020

Well this is trivially true: here's my domain specific language "main". It has a single keyword "main" that when executed runs the main method of the 100 million line codebase (which is the language runtime).

c3534l · on Jan 21, 2020

I suppose at a certain point you violate the source coding theorem. As a back of the envelope calculation, take the compiled binary and tarball it. That's the smallest you should consider even possible assuming that compressed binaries are readable to, say, an alien species with extraordinarily different minds. Suffice to say, getting rid of a million lines is one heck of a one-liner.

nnq · on Jan 21, 2020

> As a back of the envelope calculation, take the compiled binary and tarball it. That's the smallest you should consider even possible

This is 100% wrong in practice! Binaries are almost always larger than the source code they are produced from. Just as when implementing math in code, the code always takes up more space than the equations it computes.

Higher abstraction enables higher "compression" (but this is different from the formal Shannon / info-theory meaning of compression...). If go to the realm of maths, you see that the sky is the limit.

The source coding theorem is useless when talking about code and not static data. Code encodes "how something is executed" or "what happens", not "what information there is somewhere". Execution depends on DATA plus CONTEXT. By creating a smarter and smarter shared (between multiple programs) context of abstractions, source code size can get smaller and smaller (while the executable code size is upper bounded by the fixed context of the hardware executing it - you can't write code that rearranges the atoms inside a CPU ...yet) by pushing stuff "out of the program", abstracting it away into the context, and you can "violate" the source coding theorem all the time, because what you're actually encoding is not purely data/information, but something way trickier, which we are not yet capable of formalizing mathematically fully... it's about encoding "behavior that produces a subjectively desired outcome", that's what most practical programs do in the real world, a very very small part of it is "unambiguous clearly specified" algorithms that could in theory be formalized well enough to represent as "data"...

kragen · on Jan 21, 2020

A lot of those bits are random noise: incompressible, but also worthless.

In the continuum between pure accidental random noise and pure essence, we have innumerable intermediate stages. Some code produces an enormous amount of value in a very small amount of code; other code produces almost no value in a very large amount of code. The PUC Lua interpreter is like 150 kilobytes in its standard configuration, and LuaJIT is twice that size, but in many cases LuaJIT produces code with performance comparable to GCC, which is on the order of 32 megabytes (for a single platform), 100× the size.

rustybolt · on Jan 21, 2020

Infix notation is closer to natural language, but it's only easier to read for simple expressions. You really have to know the precedence and associativity of each and every operator that is used to make sense of an expression.

platform · on Jan 21, 2020

> One can create new languages fit for the purpose of a given class of problems and thus reducing size of codebase 10 to 100 fold

interesting, thank you for sharing.

Could I transpose your thought into:

--

Solutions (or large scale programs), should not be written in a general purpose language.

Instead, there should be core domain language (solution DSL) developed (using a general purpose language) and then the solution should be developed using the solution DSL.

?

AnimalMuppet · on Jan 21, 2020

Here's another unpopular opinion: Not much of the value of a DSL is in the syntax.

That is, I'm trying to write a program to do X. In, say, C++, I create data structures and functions as building blocks to write the higher-level functionality of the program. That's my "DSL" - just classes (structures and functions).

In Lisp, I can go further. In creating a DSL, I can not only create function (and, through functions, data structures), but I can also create syntax. This may let me write less code, and may make the code I write clearer.

My assertion is that, while that's nice, most of the DSL is actually the functions and data structures. The syntax is sugar - say, maybe 10% of the benefit, but not more.

celeritascelery · on Jan 21, 2020

> My assertion is that, while that's nice, most of the DSL is actually the functions and data structures. The syntax is sugar - say, maybe 10% of the benefit, but not more.

I don’t know, thinking of some DSL’s like HTML or SQL, syntax seems to be a really big part it.

Lutger · on Jan 21, 2020

It isn't immediately obvious that it is, to me. Could you expand on that?

For example, what works for html is basically the dom (data structure) and its functionality (the functions). There are others ways of creating and working with those than the specific html syntax. For example, haml. Or you might say, jsx, or elm,is also such a syntax for the same data structures.

These other approaches work perfectly fine, too.

celeritascelery · on Jan 21, 2020

Touché. I understand better what your saying. But even though the syntax changes, we always have some specialized syntax for using it. But it does make it seem like the data is more important since that is what stays the same.

vidarh · on Jan 21, 2020

Most constructs in most languages are syntactic sugar + varying degrees of enforcing constraints. The core you need to e.g an object oriented system or first order functions or any other higher level construct is tiny. You can do closures in C for example (done it). But it gets ugly and unreadable and hard to follow.

Syntax matters hugely in making the expression of an idea concise and readable.

Blocks in Ruby (or Smalltalk..) for example are pretty much just syntactic sugar over closures, are just syntactic sugar over functions, but both makes a substantial difference in ease of use, which again affects how people tend to think about problems.

E.g we are much more sparing with closure like constructs in C, even though nothing prevents it, because it is tedious.

ColanR · on Jan 21, 2020

J fixed the second problem you have with APL, and many people think that was a mistake. What do you think the better version of lisp/apl/whatever would actually be? I get the idea of a simple 1 + 2, but that doesn't express all the power of what makes lisp lisp.

lmm · on Jan 21, 2020

Interesting idea.

How would you write functions of multiple arguments as infix? Scala can get close to

    myListOfInts foldLeft 0 +

which is... honestly not much more readable.

How do you resolve ambiguity when functions take optional arguments / are overloaded? Or would you not allow that?

snidane · on Jan 21, 2020

Make the right side of the function call a list or a dict.

  myListOfInts foldLeft (0, +)

or if you make : to be right associative cons function

  myListOfInts foldLeft 0 : +

or using dict

  myListOfInts foldLeft (zero: 0, op: +)

lmm · on Jan 22, 2020

At that point you're introducing significantly more syntax, destroying the nice lisplike properties.