Wow! I can't wait to try this out. I've been having some problems getting Haskell to match C's performance in code heavy on buffer mutation, but halving the Haskell execution time would put it on roughly equal footing.
This provides a lot of encouragement to start some of my rainy-day projects -- my goal is to have a Haskell library with a proof of correctness that can outperform equivalent C. That day draws ever closer.
Isn't that a bit like saying that it "can outperform equivalent assembly"?
And why can you not prove the equivalent C correct? After all, you're doing that proof by hand either way. A Haskell compiler will only help you prove that your program is well-typed. If you're mutating buffers then you're in a monad like IO or ST and the the compiler will throw it's hands up and tell you that correctness is entirely your problem.
This is not true in general. It depends on how you are coding the buffer mutations. Doing operations in IO does not remove any type safety. If you are treating blocks of memory as arrays of untyped values, then you are removing type safety, but there is no reason to believe that the author is doing this.
Personally, I have found C libraries a lot easier to use from Haskell. A few "newtype" and Storabe instances later, you get a library that is impossible to make crash. You write the rest of your program in terms of that (in pure Haskell), and you are pretty much guaranteed enough safety to not segfault.
(I've specifically worked with a third-party financial analytics library that is quite crashy if you call certain functions in certain order. I used the type system to prevent myself from doing that, and now I can write complex problems that don't crash. Great success.)
Isn't that a bit like saying that it "can outperform equivalent assembly"?
No; as shown in the link, this work allows idiomatic Haskell to outperform idiomatic C. Obviously it's possible to write C which is a direct translation of the generated assembly, but such code would be nearly unmaintainable, and certainly very difficult to reason about.
And why can you not prove the equivalent C correct?
Proving the correctness of C code is possible, but (in my experience) significantly more difficult than for Haskell.
After all, you're doing that proof by hand either way. A Haskell compiler will only help you prove that your program is well-typed. If you're mutating buffers then you're in a monad like IO or ST and the the compiler will throw it's hands up and tell you that correctness is entirely your problem.
I don't mean that the compiler will automatically generate a proof for me, but that authoring a proof will be possible in practice -- as opposed to C, where it's possible in theory but nobody actually can.
I'm a bit confused about why you think operating within a monad causes problems for proving correctness; to my knowledge, verifying code which uses monads is no more difficult than code using any other data types.
as shown in the link, this work allows idiomatic Haskell to outperform idiomatic C
Interesting that you mention this. I know when I am writing C, I usually "malloc" dynamic data structures, whereas in Haskell, I usually "alloca" dynamic data structures (when talking to C libraries, obviously). Same end result, but alloca is actually faster, which makes allocation-heavy libraries faster when used from Haskell. (I guess I should use alloca in C, too. But I've never seen this anywhere.)
I noticed that this was especially helpful in a library I recently used that never allocated any of its own memory; it was always the responsibility of the caller. Haskell idioms worked much better than C idioms here.
I'm a bit confused about why you think operating within a monad causes problems for proving correctness; to my knowledge, verifying code which uses monads is no more difficult than code using any other data types.
Wait, are you saying that calling a function called "(>>=)" is no different than calling any other function? Shocking. :)
In theory, the information a Haskell compiler has available should allow it to make better optimizations. Things like drastic whole-program optimizations aren't viable in C because the compiler can't guarantee safety and doing the optimizations by hand would render the code base completely unreadable. This is why, for instance, FORTRAN can easily outperform C for certain classes of program.
In practice, the overhead of supporting Haskell's other features means it's still generally slower than most popular compiled languages, though not by much. I have much higher expectations, however, for improvement in Haskell optimizers than I do for nontrivial static analysis for impure languages becoming viable.
It may be infeasible to do all the effect analysis though. You have to reconstruct high level semantics. That's just not going to fly in general, though it will do well in some simple cases.
I agree. C is weird in that the programmer has to throw away almost all of his intent and just tells the machine exactly what instructions to perform. You can read C and sort of get the idea of what the programmer wanted to do, but it's difficult. In Haskell, there is very little "how" and a lot of "what" (and "why"), and so there is much more opportunity for optimization. Your work with stream fusion is a great example -- what a C programmer might think is a bunch of loops can be condensed down to one loop "sometime later". And in Haskell, these effects can be composed, whereas in C, you are stuck with what the programmer wrote.
This is why I am always confused when people say "functional programming is slow" -- current C implementations are very close to being as fast as theoretically possible, where as current FP implementations are nowhere near that (there are many optimization opportunities that are currently unexplored or unimplemented). And even with only a few optimizations, many FP implementations (GHC, OCaml, SBCL) are almost as fast as C already!
This is seriously cool. The compiler-writer part of me enjoyed * LLVM transforms the code far more aggressively* and later references to where it was hard to figure out what the assembly code does.
Wondering if they have done the test where the llvm flag has recompiled ghc.
http://www.cse.unsw.edu.au/~davidt/