I think it's plausible that writing code in C, Pascal, or languages of a similar...

chubot · on Jan 21, 2020

BTW I remember your "memory models" post from a few years ago, and I found it insightful! It got me thinking about many related things.

http://canonical.org/~kragen/memory-models/

Around that time I'd noticed that the stringly-typed languages of shell, awk, and make have no garbage collection. They're value oriented and not reference oriented. The goal of Oil was basically to "upgrade" shell into something more like Python, and on the surface that looks straightforward, but the reference model vs. value model makes it pretty different.

I've gone back and forth on that... whether I should just slap GC into a shell, or whether I should try to preserve the value model while making it less stringly-typed and more expressive.

I have a more ambitious idea for a language where every value is essentially a pair (Go-like string slice, structured data), but I probably won't get to that in the near term. For a concrete example, I noticed from writing some HTML processors that both DOM and SAX APIs have a bunch of flaws for processing semi-structured text.

-----

Also, on the other point, let's consider these two possibilities about Chez / Dybvig:

1) Dybvig is an average programmer who got "superpowers" from Lisp, i.e. got a 10x boost in productivity.

2) Dybvig and Chez are an outlier, e.g. like Fabrice Bellard and his recent QuickJS (and other projects), or Lattner and LLVM/Clang/Swift. The salient point is that these projects have nothing to do with Lisp.

It's obviously not an either/or thing, but I'd say the answer is closer to #2. The complexity is from the problem domain, and many major software projects are minor miracles that only a few people are qualified for, regardless of language.

kragen · on Jan 26, 2020

I'm delighted that you enjoyed that note! You might enjoy parts of Dercuano, too, then.

As I understand it, shell, awk, and Tcl are more or less linear languages, except that copying and destruction are implicit. It's fairly straightforward to make a linear language that includes more expressive types than just strings; Tcl 8 even did it without breaking backward-compatibility. Linearity is pretty incompatible with the OO worldview in which objects have identity and mutable state, but to the extent that you provide Clojure-like FP-persistent data structures whose operations merely return new states, maybe the OO worldview can just go suck it. Rust takes a different tack in which copying is optionally explicit (though destruction still implicit: "affine typing" rather than "linear typing") and you can use "borrowed references" to avoid the headache of explicitly returning all the argument objects you didn't destruct.

A nice thing about variations like Rust's and Clojure's is that you can preserve the expressiveness of the Lisp object-graph memory model without all of its drawbacks: no aliasing bugs, no circular references complicating the life of the GC, and in Rust's case, no garbage collector at all.

> I have a more ambitious idea for a language where every value is essentially a pair (Go-like string slice, structured data), but I probably won't get to that in the near term. For a concrete example, I noticed from writing some HTML processors that both DOM and SAX APIs have a bunch of flaws for processing semi-structured text

Right, text markup isn't tree-structured, and I think HTML5 actually prescribes fixups for <i>some <b>incorrectly</i> nested</b> markup. The GNU Emacs/XEmacs schism was largely about how to handle this problem for text editing; the XEmacs side added "extents" to which you could attach structured data (such as a font or a link destination), which are more or less the type of values you're describing, while the GNU Emacs side instead added "text properties" which conceptually applied independently to each of the characters of buffers or strings, but under the covers were of course optimized with an extent-like representation. Neither side of the schism considered replacing text buffers with S-expression-like trees like the DOM; that was Microsoft's fault :)

Are you thinking that when you concatenate a couple of such string slices, the operation will be O(1) because it uses ropes behind the scenes? Or are you thinking of not having concatenation at all as a primitive operation, instead using an unboundedly-growing scratch buffer (not, itself, a value) that you can append them both to and then take slices of? Is the text in the slices mutable, and if so, can it also grow and shrink? I think there's a large design space of interesting things you could do.

About Dybvig and Chez, sure, Kent Dybvig is a wizard. But so is Guido van Rossum, whatever embarrassing errors he may have made regarding first-class blocks and tail-call optimization in Python. I haven't asked Dybvig, but I don't think he's a Bellard-class wizard, so I don't think that's the answer.

Or are you saying that Python's problem domain is much harder than Chez Scheme's domain?

By the way, my apologies for having delayed so long in responding to your thoughtful notes.

chubot · on Jan 27, 2020

There are a lot of interesting things to pull on here, but I'll reply first with a comment I just wrote that linked back to this thread.

https://news.ycombinator.com/item?id=22157587

A tentative slogan is "bijections that aren't really bijections", i.e. to describe the "fallacy" of (bytes on disk -> data structure in memory -> bytes).

Hence the (slice, data structure) representation.

That's the correctness aspect. I also think there's a big performance aspect, e.g. it appears to me that the performance of many/most parsers is dominated by object allocation.

Some vaguely related stuff here: https://github.com/oilshell/oil/wiki/Compact-AST-Representat...

Anyway I don't think this will make it into Oil any time in the next year, but I think there is room for it in programming languages, and lots of other people are barking up that tree (e.g. FlatBuffers relate to a big part of it). I would be happy to continue batting around ideas in non-nested HN thread :) So as mentioned elsewhere, feel free to send me a mail or ping me on Zulip!

I'm looking at Dercuano but I'm confused why it's a tarball and not a live website :)

kragen · on Jan 27, 2020

> I'm looking at Dercuano but I'm confused why it's a tarball and not a live website :)

So you can keep reading it after I'm dead.

chubot · on Jan 21, 2020

Yes I totally agree -- C is way more productive than assembly, and dynamic languages are way more productive than C. I think those are incontrovertible facts that have been proved over and over again in the market, according to the test in my post.

I'm arguing against the OP who is essentially claiming that the homoiconity and metaprogramming/DSLs of Lisp are a huge advantage over say Python.

I would argue the opposite: all the essential innovations of Lisp made its way into other languages like Python and Ruby (and Julia, which has a Lisp-like macro system by way of its femtolisp front end [1]). It was a hugely influential language but it's not ahead of the state of the art right now.

Another way of saying it: PARSING is the only difference between Lisp metaprogramming and metaprogramming in any other language, and parsing isn't an essential difference. It's a bit of friction. (Though I want to improve parsing, long story ... [2] )

-----

In fact I've done what you could consider a software engineering experiment along these lines in the last few years.

I wrote a bash-compatible shell, running thousands of lines of unmodified bash scripts, in Python. It's 5-7x fewer lines of code than bash, due to metaprogramming with Python, ASDL, and re2c. I claim it's significantly faster to develop and has fewer bugs because of this. It's very very different than how bash is written.

Line counts: http://www.oilshell.org/blog/2019/06/17.html#why-is-it-writt...

Well it took over 3 years, but bash has been around for 30, and I also enhanced the language in many ways. I won't argue if someone says the jury's still out, but here is a significant achievement on performance:

http://www.oilshell.org/blog/2020/01/parser-benchmarks.html

Despite being written in Python, and parsing more thoroughly by design, the parser (which is a ~10K line program in Python, and more in bash) is faster than bash's once it's automatically translated to C++. So basically the high level "DSL" of Python or "OPy" retains enough information to make it better than the state of the art.

Here's a comment on this non-Lisp heavily DSL-based architecture from a few months ago:

https://news.ycombinator.com/item?id=20799926

To summarize it, if you look at how bash is implemented, it's 140K lines of "groveling through backslashes and braces one at a time in C", as I like to call it. And it has about as bugs / mistakes as you'd expect with that style, including some serious security problems.

So you can use DSLs to "compress" that code -- and I claim make it more correct and faster (jury is still out on memory usage). And you don't need Lisp to do it. Python was a fine choice for metaprogramming and DSLs; Lisp doesn't have any advantage.

In fact in January 2016 I hacked on femtolisp specifically because I was thinking about borrowing Julia's implementation approach for Oil. But yeah it's not enough of a win to justify the downsides. If someone wants to prove me wrong, write a bash-compatible shell in Lisp in under 3 years ;)

It's not going to be any easier, just like writing an x86 backend isn't much easier in Lisp. The difficulty is from the problem domain and not the language. And I noticed that all compilers use metaprogramming and DSLs anyway -- LLVM has hundreds of thousands of lines of TableGen. The Go compiler uses a Lisp-like DSL to express SSA transformations.

So basically Lisp is great but it's been thoroughly absorbed, and PG basically extended a falsifiable argument that was proven false. All of YC's biggest successes use dynamic languages, not Lisp. Stripe uses a ton of Rails, AirBNB is Rails, Dropbox is Python, etc.

https://www.ycombinator.com/topcompanies/

Instagram and YouTube are Python, etc. Wikipedia and Facebook are PHP, etc. I think that constitutes enough evidence.

-----

edit: I should also add/clarify that I've met programmers who I'm sure can bang out the equivalent of bash's 140K lines of C in (much better) C in a relatively short period. I'm not saying the way I did it is the only way to do it. Rather, I'm saying Lisp doesn't have any fundamental advantage in metaprogramming. The only difference is parsing, which is a small bit of friction.

The other point which people have brought up many times is that most people are bad DSL designers. It may be better to rely on "known" DSLs than to invent your own for every program. That can be a drag on productivity negating any of the potential benefits of smaller code.

[1] https://docs.julialang.org/en/v1/manual/metaprogramming/

[2] This was supposed to be the start of that argument http://www.oilshell.org/blog/tags.html?tag=parsing-is-diffic...

kragen · on Jan 26, 2020

> dynamic languages are way more productive than C.

I meant to say that high-level languages were more productive than C; I think there are a lot of people who will be happy to tell you that they are more productive in Haskell than in C, as much so as in Python, even though Haskell is a lot more static than C. (I don't know Haskell well enough to say; the only Hindley–Milner language I know is OCaml, and I feel less productive in OCaml than in C.)

> I'm arguing against the OP who is essentially claiming that the homoiconity and metaprogramming/DSLs of Lisp are a huge advantage over say Python. ¶ I would argue the opposite: all the essential innovations of Lisp made its way into other languages like Python and Ruby

I pretty much agree. Lisp was a unique language in, say, 1990, when it had garbage collection, dynamic typing, dynamically expanding data structures, higher-order functions, eval, easy serialization and deserialization of data structures, an object-graph memory model that made it easy to represent things like parse trees, and easy metaprogramming through homoiconicity. Meanwhile, the other popular programming languages were things like C, Pascal, BASIC, FORTRAN, sh, Lotus 1-2-3 formulas, dBASE, and COBOL, which had none of those; a lot of them didn't even have recursion. Today, though, Python has all of them except for homoiconicity, and we can do quite a bit of metaprogramming in Python, using its pervasive polymorphism and higher-order programming rather than macros. Most other popular modern languages like Perl 5, Ruby, JS, Lua, and PHP 5+ are basically dialects of Lisp. Even languages that differ dramatically from Lisp, like OCaml, Kotlin, C#, Swift, and recent versions of Java are pretty close.

> I wrote a bash-compatible shell, running thousands of lines of unmodified bash scripts, in Python. It's 5-7x fewer lines of code than bash, due to metaprogramming with Python, ASDL, and re2c. I claim it's significantly faster to develop and has fewer bugs because of this. It's very very different than how bash is written.

That's a very impressive achievement! Does it use more RAM than bash does? I'd think that one of the potential benefits of C over Python is that, because C requires you to specify your memory layout, C programs tend to use significantly less memory than otherwise equivalent Python, usually by about an order of magnitude.

That is, some of the order-of-magnitude complexity reduction you got in your experiment is surely due to Python, but perhaps some of it is due to RAM costing about US$100 per megabyte when Brian Fox started writing bash in 1988.

> Python was a fine choice for metaprogramming and DSLs; Lisp doesn't have any advantage.

I think that may be overstating the case; no doubt there are some things that are easier in Python and some things that are easier in Lisp. SBCL or Racket can usually beat CPython or Jython by about an order of magnitude on speed, for example, and having first-class lambdas really is an improvement over Python's botched mixture of context managers, function decorators, and generator abuse. In the other direction, Python's syntax is significantly more readable — especially for arithmetic — and SBCL and Racket don't have anything like Numpy, and their own compilers are not an adequate replacement. But the difference between Lisp and Python, whichever one it favors, certainly isn't close to the enormous advantage Lisp or Python has over Pascal.

(And I think Clojure is really the most interesting Lisp, actually.)

> I should also add/clarify that I've met programmers who I'm sure can bang out the equivalent of bash's 140K lines of C in (much better) C in a relatively short period

I would be very surprised about that; but Andrew Appel did come close to winning the ICFP programming contest one year using that well-known functional programming language, C. And a few years later someone won using C++.

On the other hand, you did end up writing OSH in OPy rather than, strictly speaking, in Python. The initial Python versions sort of ended up being a prototype, in a way. And you could presumably have written OPy in some other language. Maybe it would even have been easier to write it in OCaml or F# or Haskell. (But if the language it compiled were less compatible with Python, I guess you would have had to rewrite a lot of the Python standard library.)

So maybe that's what "much better C" would look like in the context of bash: a smallish core compiler or interpreter for the DSL in which the rest of the shell is written?