More

Rusky · 2024-12-21T21:25:28 1734816328

Rust's ownership model has two aspects:

- A dynamic part specifies what is actually allowed, and totally supports doubly linked lists and other sorts of cyclic mutable data structures.

- A static part conservatively approximates the dynamic part, but is still flexible enough to express the interfaces and usage of these data structures even if it can't check their implementations.

This is the important difference over traditional static analysis of C. It enables `unsafe` library code to bridge the dynamic and static rules in a modular way, so that that extensions to the static rules can be composed safely, downstream of their implementation.

Rust's strategy was never for the built-in types like `&mut T`/`&T` to be a complete final answer to safety. It actually started with a lot more built-in tools to support these patterns, and slowly moved them out of the compiler and runtime and into library code, as it turned out their APIs could still be expressed safely with a smaller core.

Something like Fil-C would be totally complementary to this approach- not only could the dynamic checking give you stronger guarantees about the extensions to the static rules, but the static checks could give the compiler more leverage to elide the dynamic checks.

Rusky · 2024-11-25T17:38:52 1732556332

C++ is not a dependently typed language, for the same reason that templates do not emit errors until after they are instantiated. All non-type template parameters get fully evaluated at instantiation time so they can be checked concretely.

A truly dependently typed language performs these checks before instantiation time, by evaluating those expressions abstractly. Code that is polymorphic over values is checked for all possible instantiations, and thus its types can actually depend on values that will not be known until runtime.

The classic example is a dynamic array whose type includes its size- you can write something like `concat(vector<int, N>, vector<int, M>) -> vector<int, N + M>` and call this on e.g. arrays you have read from a file or over the network. The compiler doesn't care what N and M are, exactly- it only cares that `concat` always produces a result with the length `N + M`.

groos · 2024-11-26T02:18:07 1732587487

I'm not sure what "dependently typed" means but in C++20 and beyond, concepts allow templates to constrain their parameters and issue errors for the templates when they're specialized, before the actual instantiation happens. E.g., a function template with constraints can issue errors if the template arguments (either explicit or deduced from the call-site) don't satisfy the constraints, before the template body is compiled. This was not the case before C++20, where some errors could be issued only upon instantiation. With C++20, in theory, no template needs to be instantiated to validate the template arguments if constraints are provided to check them at specialization-time.

Rusky · 2024-11-26T03:13:45 1732590825

This is the wrong side of the API to make C++20 dependently typed. Concepts let the compiler report errors at the instantiation site of a template, but they don't do anything to let the compiler report errors with the template definition itself (again before instantiation time).

To be clear this distinction is not unique to dependent types, either. Most languages with some form of generics or polymorphism check the definition of the generic function/type/etc against the constraints, so the compiler can report errors before it ever sees any instantiations. This just also happens to be a prerequisite to consider something "dependently typed."

zozbot234 · 2024-11-25T19:43:27 1732563807

> performs these checks before instantiation time

Notably Rust type-based generics do this, a key difference wrt. C++ templates. (You can use macros if you want checks after instantiation, of course.)

galangalalgol · 2024-11-25T18:07:25 1732558045

In c++ it does care what N and M are at compile time, at least the optimizer does for autovectorization and unrolling. Would that not be the case with const generic expressions?

Rusky · 2024-11-26T03:09:10 1732590550

The question of whether a language is dependently typed only has to do with how type checking is done. The optimizer doesn't come into play until later, so whether it uses the information is unrelated to whether the language is dependently typed.

galangalalgol · 2024-11-26T17:23:18 1732641798

Ok, I think I understand now, but is it really dependently typed just because it symbolically verified it can work with any N and M? Because it will only generate code for the instantiations that get used at compile time.

Rusky · 2024-11-26T19:37:41 1732649861

Is what really dependently typed? I'm saying C++ is not dependently typed, because it doesn't do any symbolic verification of N and M.

galangalalgol · 2024-11-26T20:49:07 1732654147

If rust did add const generic expressions I mean. It still would only generate code for the used instantiations.

Rusky · 2024-11-26T23:38:51 1732664331

Ah, I wasn't really talking about Rust.

Rust already does have some level of const generic expressions, but they are indeed only possible to instantiate with values known at compile time, like C++.

The difficulty of type checking them symbolically still applies regardless of how they're instantiated, but OTOH it doesn't look like Rust is really trying to go that direction.

jcelerier · 2024-11-26T13:07:44 1732626464

the only thing needed here is to be able to lift N & M from run-time to the type system (which in C++ as it stands exists only at compile-time). For "small" values of N&M that's doable with switches and instantiations for instance.

Rusky · 2024-11-26T19:43:03 1732650183

The point of dependent types is to check these uses of N and M at compile time symbolically, for all possible values, without having to "lift" their actual concrete values to compile time.

Typical implementations of dependent types do not generate a separate copy of a function for every instantiation, the way C++ does, so they simply do not need the concrete values in the same way.

Rusky · 2024-10-26T19:08:12 1729969692

> those “killer features” are usually absolute nonsense, very niche, or they rarely have any big benefit

This is a fantastic way to turn everyone off of your language: dismiss everything they like about the ones they're already using!

IshKebab · 2024-10-26T19:53:47 1729972427

Right? So I'm apparently an idiot for thinking Rust's borrow checker, or Go's blissful deployment experience, or Zig's clever comptime system are good... Ok I guess I'm not going to like Odin then.

Rusky · 2024-10-18T04:17:14 1729225034

It's not the same unwritten social contract: in Rust even the unsafe code has the same stricter type signatures as the safe code, so there is a formal way to judge which part of the program is at fault when the contract is broken. You might say the contract is now written. :-)

In C, the type system does not express things like pointer validity, so you have to consider the system as a whole every time something goes wrong. In Rust, because the type system is sound, you can consider each part of the program in isolation, and know that the type system will prevent their composition from introducing any memory safety problems.

This has major implications in the other direction as well: soundness means that unsafe code can be given a type signature that prevents its clients from using it incorrectly. This means the set of things the compiler can verify can be extended by libraries.

The actual practice of writing memory-safe C vs memory-safe Rust is qualitatively different.

im3w1l · 2024-10-18T06:48:23 1729234103

> In Rust, because the type system is sound

Unfortunately, it's not. Now I do think it will be eventually fixed, but given how long it has taken it must be thorny. https://github.com/rust-lang/rust/issues/25860

simonask · 2024-10-18T07:53:41 1729238021

In practice this is a compiler bug, though, and is treated as such, and not a soundness hole in the abstract design of the type system.

There has also not been a single case of this bug being triggered in the wild by accident.

_flux · 2024-10-18T09:09:58 1729242598

I take this to mean e.g. the Oxide project has proven the Rust type system sound?

There was a Git repository demontrating unsoundness issues in the compiler, but I seem to be unable to find it anymore :/. It seemed like there would be more than one underlying issue, but I can't really remember that.

_flux · 2024-10-18T11:45:55 1729251955

I tried to look into it, and it does not seem like the type system is proven, yet, though the issue itself is closed: https://github.com/rust-lang/rust/issues/9883

Per https://blog.rust-lang.org/2023/01/20/types-announcement.htm... there is a roadmap for ensuring Rust type system soundness end of year 2027. I think it means the implementation.

Measter · 2024-10-18T11:45:39 1729251939

You might be thinking of CVE-RS[0] which is exploiting bugs in the compiler.

[0] https://github.com/Speykious/cve-rs

_flux · 2024-10-18T11:47:13 1729252033

You are quite likely correct. My search term included "unsound", and it missed this one.

But this time I've starred and cloned it!

I didn't quite see if this actually exploits multiple bugs or leverages just one, though.

steveklabnik · 2024-10-18T13:40:29 1729258829

Just one. It’s a compiler bug and not a soundness hole.

johnisgood · 2024-10-18T08:52:29 1729241549

That code looks horrendous.

lertn · 2024-10-18T09:53:02 1729245182

This got me intrigued. Is there a soundness proof for the Rust type system?

The only language with such a proof that I am aware of is StandardML. Even OCaml is too complex for a soundness proof.

Measter · 2024-10-18T11:44:20 1729251860

There was a paper a few years ago[0] was related to proving soundness. That could be what they meant.

[0] https://plv.mpi-sws.org/rustbelt/popl18/paper.pdf

Rusky · 2024-10-16T02:44:48 1729046688

A worked example: https://github.com/pizlonator/llvm-project-deluge/blob/delug...

Rusky · 2024-10-08T19:00:06 1728414006

They could not "just" have used SPIR-V bytecode. WebGL already had to do a bunch of work to restrict GLSL's semantics to work in the web sandbox; whatever WebGPU chose would have had the same problem.

modeless · 2024-10-08T19:59:11 1728417551

In comparison to inventing a new language and three independent interoperable implementations from scratch, I think "just" is appropriate.

mwkaufma · 2024-10-08T19:22:35 1728415355

WGSL is defined as a "bijection" to a "subset" of spir-v -- they could have simply specced that subset.

MindSpunk · 2024-10-09T04:54:39 1728449679

I do not believe this is true anymore and WGSL is a "real language" now that actually requires a "real language frontend" to compile. The original spec didn't even have while loops, they expected you to effectively write the SSA graph by hand (because SPIR-V is an SSA IR). Then the design drifted as they kept adding pieces until they accidentally made an actual language again.

pjmlp · 2024-10-09T13:20:02 1728480002

That was the original plan, then they created their Rust inspired language instead.

mwkaufma · 2024-10-09T17:57:26 1728496646

Yeah exactly, that's why refusing to simply spec the bytecode at the time was exactly the sabotage that everyone called it out as.

Rusky · 2024-09-08T02:59:47 1725764387

And the same can of course be done under conservative GC...

kazinator · 2024-09-08T04:51:00 1725771060

But it won't, because people implement conservative GCs when they are not in control of all the pieces, like the compilers used for some parts of their run-time.

Rusky · 2024-09-08T17:20:09 1725816009

The point of the post is that conservative GC can be useful for reasons other than that simple lack of control.

Rusky · 2024-09-08T02:56:43 1725764203

Any tracing GC, conservative or precise, generational or not, already does this. Hinkley is wrong about what "conservative" and "precise" mean.

Rusky · 2024-08-18T17:19:56 1724001596

The Polonius rules were formulated using Datalog, but the implementation that will ship in rustc does not use Datalog: https://blog.rust-lang.org/inside-rust/2023/10/06/polonius-u...

Rusky · 2024-08-18T17:13:05 1724001185

In fact, those union-find trickeries come from the same paper that presented algorithm W, where they were named algorithm J. W was known from the start to be more useful for proofs than implementation:

> As it stands, W is hardly an efficient algorithm; substitutions are applied too often. It was formulated to aid the proof of soundness. We now present a simpler algorithm J which simulates W in a precise sense.

https://doi.org/10.1016/0022-0000(78)90014-4

derdi · 2024-08-19T15:13:30 1724080410

Isn't the union-find trickery "just" standard almost-linear unification as discovered by Paterson-Wegman and Martelli-Montanari around 1976? Or is there more to it in the setting of Algorithm J?

I'm actually a bit surprised that it took so long to discover these formulations of unification. I wonder what Prolog systems were doing at the time, given the importance of efficient unification.

Rusky · 2024-08-19T18:48:17 1724093297

Paterson-Wegman and Martelli-Montanari are worst-case linear, but Algorithm J just uses the earlier almost-linear union-find approach: unification variables are represented as mutable pointers to either a) nothing, or b) another type (which may itself be the "parent" unification variable in the union-find structure). Unification is then just a recursive function over a pair of types that updates these pointers as appropriate- and often doesn't even bother with e.g. path compression.