Coq-based synthesis of Scala programs which are correct-by-construction [pdf]

tehmillhouse · on July 21, 2017

If you're interested in this kind of thing, note that Coq can also extract code in Haskell, Scheme and OCaml.

If Coq's not quite your cup of tea, Isabelle/HOL is another proof assistant with amazing (dare I say superior?) tooling and automation, and it supports code extraction in SML, OCaml, Haskell and Scala.

Microsoft Research's Lean theorem prover is also promising in this regard. Work is almost complete on native code compilation (via C code extraction), allowing you to compile all the constructions you can formalize in Lean directly to native code. (see https://github.com/leanprover/lean/pull/1241 for progress)

(Note that none of those code extractors are verified to be correct themselves, lacking a formalization of the target language's semantics. You don't get a mathematical proof that the code you're running does what you specified, but you get pretty damn close.)

logicchains · on July 21, 2017

>If Coq's not quite your cup of tea, Isabelle/HOL is another proof assistant with amazing (dare I say superior?) tooling and automation, and it supports code extraction in SML, OCaml, Haskell and Scala.

I'll just note that Isabelle/HOL does not support dependent types, a trade-off it makes in exchange for superior automation (including excellent SMT solver integration). This makes it less suitable for some styles of proving than Coq, Agda, Idris and Lean. I suspect people coming from Coq are less likely to notice this however, as dependent types seem to be used less in Coq than in e.g. Agda or Idris due to the poor support for dependent pattern matching making some things unnecessarily difficult (although this situation is improving with every release).

unboxed_type · on July 21, 2017

>Note that none of those code extractors are verified to be

>correct themselves, lacking a formalization of the target

>language's semantics.

CertiCoq project aims to provide a certified compiler from Coq into CLight (formalized subset of "Safe C"). In this respect,

it is superior to what you have mentioned, I guess.

Offtopic: Lean's effort seems strange to me. Usage of C++ as its implementation language gets me really nervous,

I feel doubt in its reliability.

cmrx64 · on July 21, 2017

> Offtopic: Lean's effort seems strange to me. Usage of C++ as its implementation language gets me really nervous, I feel doubt in its reliability.

Not only is this unfounded, it's unreasonable. I can interpret "reliability" in two ways:

1. The tool crashes a lot

This isn't true in practice, and it throws an exception with about the same frequency that Isabelle throws an ML exception, for me. I haven't observed any segfaults but they do happen sometimes (and are usually fixed quickly, Leo is really awesome). Lean is written in pretty principled C++.

2. You don't feel safe trusting it when it says it proves something

Any tool worth its salt in this space satisfies the "de Bruijn criterion" -- can have a small, independent checker. In this case, Lean's kernel is the calculus of inductive constructions with a non-cumulative hierarchy of universes, and there exist independent typecheckers written in Haskell <https://github.com/leanprover/tc> and Scala <https://github.com/gebner/trepplein>. I have an in-progress one written in Rust as well.

Meanwhile, Lean is quite fast, the metaprogramming is lovely, and there are some really awesome tools in the pipeline.

unboxed_type · on July 21, 2017

I agree with most things you said, however.

> Lean is written in pretty principled C++.

For how long this will last? Many will agree here, that C++ is known for its ineptitude for "long-term" projects where maintenance issues are of crucial importance. I presume that proof assistants belong to that class. If Leo is the only developer of this project for the rest of life, then it is OK, I guess.

cmrx64 · on July 22, 2017

That's a reasonable concern. If Leo got hit by a bus the code quality would certainly suffer. He's pretty good about cleaning up PRs etc after merge, by which I mean, even if he isn't the only developer right now he does a lot of work to make sure the C++ code is up to snuff.

nickpsecurity · on July 21, 2017

I'm a fan of both sides of the Coq and Isabelle/HOL activities. I'm keeping an eye on CertiCoq. Right now, though, Isabelle/HOL seems to be ahead with work like Myreen and Davis's in terms of verified stacks:

https://www.cl.cam.ac.uk/~mom22/

This data structure refinement with one version getting automated to a large degree:

https://www21.in.tum.de/~lammich/pub/cpp2016_impds.pdf

Thanks to DeepSpec, Coq might exceed that in some ways:

https://deepspec.org/main

Note that a lot of work around Coq goes through CompCert which is proprietary software with restrictions on how it's used if one doesn't pay. The Isabelle/HOL alternatives like Simpl/C and CakeML are free software. Then, outside those two, we got a bunch of work being done in Maude (esp K Framework) with one an executable, formal semantics for C done like a compiler:

https://github.com/kframework/c-semantics

logicchains · on July 21, 2017

For me, C++ as its implementation language is one of the most promising things, as I suspect it's a significant reason why it's so much faster to check things than Idris. The speed difference is significant enough that I prefer it over Idris in spite of Idris' broader feature set: <1 second vs 10 seconds makes a huge difference to workflow when proving small lemmas (although it's possible I've somehow set up Idris wrong).

unboxed_type · on July 21, 2017

I would love Lean to become both reliable and fast tool. I suggest, it is in its infancy right now to judge it seriously. As for now, I would rather invest in well established tools.

hardlianotion · on July 21, 2017

What is acceptable to use in your view? In my own, if you don't like C++, you can't like Scala much and I am a fan of both

lmm · on July 21, 2017

I don't think they're that similar? I dislike C++ because of its large amount of unsafe constructs and huge number of special-cased language-level features; I like Scala because it's the most practical language I've found for writing safe enough code and because the core language is quite small, with many features moved out into libraries or patterns of combining simpler features.

hardlianotion · on July 21, 2017

They are both multi-paradigm languages with plenty of modes of expression. Both provide sufficient ways to hurt yourself. There are enough Scala implementation warts to trap the unwary.

edit - in my parent comment I should have restricted scope of the comment to the use of the language for this purpose.

unboxed_type · on July 21, 2017

Scala is different in that it has stronger type system, automatic memory management, the language is well-specified and is based on JVM, a reliable runtime. I would suggest that even Java would be much better choice than C++ for that project.

tom_mellior · on July 21, 2017

> native code compilation (via C code extraction), allowing you to compile all the constructions you can formalize in Lean directly to native code

Is that really true? What do they do for garbage collection, do they provide their own or use something like the Boehm collector?

As far as I understood, F* also allows C extraction, but only for a subset not including dynamic allocation (or maybe with explicit memory management via effects?).

cmrx64 · on July 21, 2017

No, it's not true, it "extracts" (compiles) to C++. <https://github.com/leanprover/lean/pull/1241> It's written in Lean itself, as a metaprogram. The code it generates calls into the Lean library ("runtime") for some objects, otherwise uses native types when it can. vm_obj's in Lean are reference counted. There's also in-progress work on an LLVM backend.

pharrington · on July 21, 2017

The source repo is at https://github.com/JBakouny/Scallina

amelius · on July 21, 2017

I didn't read the paper, but is this restricted to purely functional programs?

sn9 · on July 21, 2017

It's a quick read. Only 2 pages. Not particularly dense and accessible to most programmers.

unboxed_type · on July 21, 2017

Correct.

unboxed_type · on July 21, 2017

I wonder why they have not used Coq's extraction approach?

gsps · on July 21, 2017

As far as I understand, their approach is exactly that, i.e., another target language for extraction. Synthesis is a bit of an overloaded term.

unboxed_type · on July 21, 2017

"The compiler itself is also written in Scala and its underlying Coq parser is largely based on the use of parser combinators"

while "true" extractor has to be written in Coq, as I understand :-)

nickpsecurity · on July 21, 2017

When I read that, I thought "What a huge TCB versus what's typical." Additionally, there exist verified strategies for parsing that might be applicable to Scala. Depends on if their formalisms can handle the language's grammar or whatever. Additionally, if they write untrusted parts in ML, they can supplant it with stuff like property-based testing or CakeML compiler.

mathgenius · on July 21, 2017

It seems like this is the sort of thing that blockchain languages should be built on top of. Things like Etherium.

carterschonwald · on July 21, 2017

Working on something like this for work.