If you wanted to prevent undefined behavior at compile time by having the compil...

MaxBarraclough · on Jan 23, 2022

You're right that you can't define a safe subset of C without making it impractical. MISRA C defines a C subset intended to help avoid C's footguns, but it still isn't actually a safe language. There are alternative approaches though:

1. Compile a safe language to C (whether a new language or an existing one)

2. Formal analysis of C, or of some practical subset of C, to prove the absence of undefined behaviour

Work has been done on both approaches.

ZZ compiles to C. [0] Dafny can compile to C++, but it seems that's not its primary target. [1][2]

There are several projects on formal analysis of C. [3][4][5][6]

[0] https://github.com/zetzit/zz

[1] https://github.com/dafny-lang/dafny

[2] https://dafny-lang.github.io/dafny/

[3] https://frama-c.com/

[4] https://www.microsoft.com/en-us/research/project/vcc-a-verif...

[5] https://www.eschertech.com/products/ecv.php

[6] https://trust-in-soft.com/

geofft · on Jan 23, 2022

Is it better to write ZZ or Dafny code than to write Java, Python, Haskell, Rust, or Swift code?

(They look cool and I'm enjoying reading about them, but for the things I do for my day job where C is one of the obvious options, I'm not sure they're better options, but usually one of the languages above is a better option.)

MaxBarraclough · on Jan 23, 2022

ZZ's whole goal is to transpile to C, which (as their website mentions) has some practical advantages. Consider exotic hardware platforms where there's a vendor-provided proprietary C compiler and no other choice.

Dafny has a slightly different emphasis, as it's about formal verification, so it's more competing with Ada SPARK than with mainstream languages.

I can't comment on the general ergonomics on ZZ and Dafny as I've not tried them out, but presumably they're going to be well behind a language like Java with its huge ecosystem of IDEs etc. Neither seems particularly mature, either, so I wouldn't bet the farm on them.

> Is it better to write ZZ or Dafny code than to write Java, Python, Haskell, Rust, or Swift code?

If you've got the choice and you're just trying to do 'normal day job programming' I imagine that yes you'd be much better off just using Java. If you want a language that's quite like C or C++ but much better regarding safety, there are Rust, Zig, and the too-often-overlooked Ada language.

derefr · on Jan 23, 2022

I think you’re conflating undefined behavior with implementation-defined behavior.

How integers act at the extremes of their values, for example, is implememtation-defined — when you’re using the integer types (rather than using IEEE754 floats for everything to get well-defined formal semantics), you’re implicitly telling the compiler that you don’t care what happens at the boundaries, and that you want whichever behavior the target uarch’s ISA entails.

Think of it like exposing an opaque native FFI type in a managed-runtime language. The semantics of that type aren’t “up to” the runtime; all the runtime does is ensure that only a safe set of interaction primitives on the type are exposed into the managed runtime. (“Safe” in the sense of not corrupting the runtime’s own abstract machine semantics; not in the sense of making any guarantees about the operations leaving the value in a well-defined state per the underlying machine.) in C, integers — and the operations on them — are exactly such “opaque foreign types.” As are pointers, for that matter. (And this is also why casting between integers and pointers is UB — since the internal representations of each aren’t part of the semantic model of the C abstract machine, it cannot make any guarantee about their interconvertability.)

C compilers shouldn’t abort on implementation-defined behavior. Having that behavior be implementation-defined is the well-defined semantics that the C abstract machine applies to such things.

It’s only cases where there’s no clear semantics even per implementation, that the C compiler should swerve and say “I don’t know what that should mean; so I give up.”

geofft · on Jan 23, 2022

I agree that integer overflow is implementation-defined, but I wasn't talking about integer overflow, I was talking about char *.

Reading from a pointer past the bounds of the object being pointed to is undefined, not implementation-defined. How do you propose to write a size_t strlen(const char *s) that successfully compiles with your proposed compiler?