It seems like monotonicity (or a subset, stronger monotinicity, still covering many practical use cases) can be statically enforced in an algrebraic type system. With some generics magic you can encode monotonicity as a type assertion on (optionally asynchronous) function calls mapping an underlying transport/RPC mechanism (REST over HTTP, RPC over Websocket, custom protocol over TCP/domain socket...).
In TypeScript for example you can strongly type your communication protocol (web requests/reponses, websockets, parent/worker ipc, etc) and get some guarantees at compile time. For example, you can guarantee that your state
updates over the wire are eventually consistent.
The same goes of course for other algebraic type systems by encoding monotonicity in the form of static type assertions (Flow, Caml, Haskell...).
Well, since you just need to restrict the operations one can do with data, even a Java or C++ style type system is enough.
What is best done with another language (or a complex library on a very malleable language) is permitting non-monotonic operations with a clear boundary between them and some code style that makes them undesirable.
> CALM falls short of being aconstructiveresult—it does not actu-ally tell us how to write consistent, coordination-free distributedsystems. Even armed with the CALM theorem, a system buildermust answer two key questions. First, and most difficult, is whetherthe problem they are trying to solve has a monotonic specification.
> The second question is equally important:given a monotonic specification for a problem, how can I imple-ment it in practice?
Can anyone explain a good way to use these results?
The OP (acolyer.org) seems to suggest that this boils down to using CRDTs but that doesn't seem to be isn't really the answer but I can't figure out how to use/properly think about the results in this paper (as opposed to others that are a bit more practical). I've crawled through a bunch of paxos papers[0], and at this point it seems to that the distribution part of the problem is somewhat solved but the coordination required is what we're trying to attack now.
Whenever I see systems that use CRDTs it's usually either a K/V store or a concurrent editing application. Also, most articles will gloss over the fact that both types of CRDTs (cmRDTs and cvRDTS IIRC) are actually roughly equivalent (pure operation-based CRDTs[1]). IMO pure operation-based CRDTs are just distributed WAL by another name -- and if that's the case then we already have the solution for this (paxos-like quorum on the WAL segements).
All this boils down to the thought I've been holding lately that distributed consistency isn't hard in this day and age -- WAL (which you generally want for hard-disk-level consistency to start with, if you're not just waiting for big fsyncs) + synchronous commit (whether quorum or all nodes) is all you need to achieve a perfectly consistent distributed system, but what is hard is doing that with low latency and high throughput.
The benefits we want to obtain from building distributed systems are:
1) Increased availability
2) Ability to scale (better throughput)
3) Lower latency (get the data closer to the client)
As you said, WAL + Consensus solves the consistency problem in distributed systems. It does however go against all those desirable properties:
1) You lose availability when consensus cannot be reached
2) Throughput is decreased in the face of contention
3) Latency is worse because you need quorum between the separate locations
One way to go about this trade-off is to push for "raw power". Better networks, better clocks, etc.. This a commendable task and advances here should be celebrated, but there's a wall in the horizon: eventually we will hit actual physical limits (e.g.: speed of light doesn't let you lower latencies anymore). Google's spanner is probably close to those limits already. What can we do when this is not enough then?
The other approach is to work on reducing the need for coordination as much as possible. The paper fits in this realm. What it does is to identify a class of (sub)problem specifications that are solvable without coordination: monotonic specifications. It shows that CRDTs are monotonic, and I'm not sure whether any monotonic specification can be redefined as a CRDT (be it operational or state-based).
What CALM provides though is a "different way to think about the issue". If you can devise a monotonic specification of your problem, then you know that an implementation that provides consistent output without any coordination is possible. Furthermore, if your design is not monotonic, an implementation will require coordination or the output won't be consistent.
Finally, the paper dabbles in the realm of breaking your problem in monotonic and non-monotonic "pieces". The monotonic pieces you can implement without coordination. Non-monotonic pieces either require coordination OR "repair" (e.g.: send an e-mail to the customer apologizing that their item is actually unavailable and their order has been cancelled). Once a non-monotonic "piece" has been handled in this manner, the new piece that includes repair/coordination can be considered monotonic. Once all your system's pieces have gone through this process, you are guaranteed to have a "consistent" output.
An interesting analogy would be the "unsafe" blocks in rust. Rust in general is not safe because unsafe exists, but the fact that unsafe pieces are clearly identified makes it easier to reason about program safety as a whole. Similarly, the non-monotonic pieces of your system are where you risk losing consistency, whereas monotonic pieces are just not a problem (i.e.: you can get easily get good performance for those parts). By extension, if you can model your system so that all non-monotonic pieces are out of the critical path, your system is guaranteed to perform well in those scenarios!
All in all, this is just about giving better tools/thinking frameworks for system designers to minimize the use of coordination in a distributed system, which will invariably result in better performance without sacrificing output "consistency" (i.e.: without violating business rules).
That's an excellent explanation, except for couple of points which I think can be misinterpreted. A distributed system with consensus will in practice provide higher availability than a single-node system, because it provides fault-tolerance. In fact, fault-tolerance is the primary point of using (non-Byzantine) consensus. But you are absolutely right that a distributed system using consensus has worse availability than a distributed system with no coordination.
Also, when using a system "with consensus" there is often no need to actually invoke consensus on the read side of the system, in which case you don't have to pay the throughput and latency penalties. I know you've sort of said this already, but it might be helpful to mention explicitly.
> A distributed system with consensus will in practice provide higher availability than a single-node system, because it provides fault-tolerance.
I'm not sure this is true. It protects against one class of fault (node failure) but opens you up to another (network failure). As a distsys engineer I am increasingly convinced that fault tolerance is not a good selling point for distribution, the fallacies of distributed computing are real and difficult to accommodate.
That's not really true. With a single, non-replicated server, you're also very much exposed to network failures. If the server's connection goes down, you're screwed. Compare this to a Google replication setup of 2 East Cost, 2 West Coast, and 1 central USA server. The client must only be able to reach two out of three data centers (and they have to be able to communicate with each other). That sounds much more resilient to me - and I guess Google agrees, since they deploy the setup.
> when using a system "with consensus" there is often no need to actually invoke consensus on the read side of the system, in which case you don't have to pay the throughput and latency penalties.
Doesn't matter if the system is designed for faster reads. There is still coordination, you still pay coordination overhead, including throughput and latency penalties. Without coordination, for example, you can have some part of the database relevant to the client stored directly on the client, not doing remote reads at all.
You're comparing apples to oranges. If read coordination can be avoided (and it often can, even in systems with consensus), whether you stick a cache on the client or not is completely orthogonal to whether the system uses consensus for write operations.
Think about it, if you read from a replica that is partitioned from the rest of the system and there is no coordination, how would the replica or the client know that the value replica returns is too old and therefore breaks strong consistency guarantee?
No, you don't get linearizability of all operations, you might not even read your earlier writes - but the whole point is that you sometimes don't need these guarantees for reads. You get a consistent snapshot read, and that's often good enough. You can get an idea of how recent the snapshot is based on timestamps, but "recent" is hard to define in a distributed system.
> but the whole point is that you sometimes don't need these guarantees for reads.
If that was your point, than sure. If you drop consistency, you can drop coordination too. But typically people expect reads to be consistent in consensus based systems, which requires coordination.
Well written explanation. I think categorizing problem into monotonic and non-monotonic pieces will be especially helpful for a developer when the underlying database provides multiple consistency levels to choose from.
This is outstanding -- although in lieu of Yet Another New Programming Language (YANPL), I would suggest simply having developers learn the concepts and apply them no matter what environment they're in. This should be part of any developer's education.
I have some good friends, smart people who know a lot more about good programming than I ever will, who keep saying things like "OO and FP are just different flavors. It's all the same"
That's both true and dangerously incomplete. Pure FP takes you places like this essay. Yeah you can make all of this happen in OO, but it sucks. And if you want to code at scale you gotta know exactly how much it sucks and what to do about it.
ADD: My biggest critique of this essay is that the people who need to understand it most will be put off by the terminology. I have no way to fix that, but it still sucks.
"This is outstanding -- although in lieu of Yet Another New Programming Language (YANPL), I would suggest simply having developers learn the concepts and apply them no matter what environment they're in."
One of the recurring themes I've been banging on on HN for the last couple of months is that there's actually a lot of room for new languages to come out that aren't just respellings of the current languages we have, and this is one of the opportunities that exist. I say this because the vast bulk of our current languages will just fight you so much if you try to work this way. What most languages are designed to do and designed to make easy are precisely the things that will blow your foot off. Trying to make this work in current languages is tapdancing through a minefield. It can certainly be done, but it's not something I think we can ever reasonably expect to be done at scale.
As you suggest, Haskell is pretty much the only semi-popular language that could implement this with a library, and it would be correct. And even then it probably would be a not-entirely-optimal experience, just because it's not really optimized for this.
I don't think we disagree. I'm a big fan of DSLs/languages over further app complexity. To clarify my point, Another Language (tm) is not the problem. The problem is a lack of understanding of the fundamentals.
I keep seeing people sell tech/languages/frameworks as some sort of replacement for deep understanding. Hey, that's cool -- as long as you don't spend all your time honking around with the tool instead of the problem. It's when these promises fail that devs and teams end up on a dark, lonely path with unhappy customers.
Overall I am quite fascinated by the concept of new environments for programmers -- and I believe it's not just new languages, but the tools, the industry background, the workflow, and a lot of other things come together to make tools and frameworks rock. (Many times we all want to geek out on language details when the problem with the language is lack of a community or tooling that makes you want to jump out a window. But I digress.)
Something I'm working on is coming up with guidelines of when and how to create good DSLs in order to create that kind of environment.
We do not. I meant more as elaboration, not outright disagreement. It is a great idea to apply these ideas today, but it's really hard, and I'd still love to see someone fix that. And maybe some other things as they go. But in the meantime, programmers of today really can't afford to wait to apply these principles.
I guess my point is that I agree that YAPL that just has this in it and is basically thrown out as a hack for writing a couple of papers is not very useful, but I'd love to see someone set out to create the Next Big Programming Language that incorporates this idea, as well as other things, but isn't just a demonstration project.
"Of the candidates enumerated in 'NSB', object-oriented programming has made the biggest change, and it is a real attack on the inherent complexity itself." - Fred Brooks, No Silver Bullet Reloaded
I think we're mixing some things up here. If I have added to that confusion, I apologize.
Yes, OO is great. It's all of modern computing. Love me some abstraction levels -- as long as they actually abstract things. Any layer that I spend more time screwing around with other than making things people want is a net negative.
And there's my point: I'm not talking about the mechanics of FP or OO. I'm also not talking about the relative successes or failures of either model. My only point was how various modes of thinking end up solving problems different ways.
I find this a recurring theme. At the end, it's all Category Theory. So we're not talking about magic sprinkles here. Whatever we put on top of the math to help us reason about problems -- OO, FP, pure FP, and so on -- has various effects on the way we reason about problems.
Hell I don't care if you program COBOL. What's fascinating to me is how some folks read essays like this and go "Of course! Very cool" and others start talking about how it's all theoretical.
I read half-assed essays everyday that could mostly be described as some version of "I learned OO and now I think of everything as objects" The natural consequences of this thing -- which is not wholly bad -- is the same as the natural consequences of poor abstraction layers in coding: the person using them spends more time thrashing around trying to keep consistency and nomenclature aligned than they do working on the important stuff. Can you do the same with FP? Sure! But that's a conversation for a different day -- because it's a different failure mode.
Hmm. I thought it was all binary machine instructions. Or was it all NAND gates? Or everything is an object? Or everything is a list? Everything is a file?
Reductionist revelations are fun, and they also have some utility. But we have to be careful, because we tend to latch on to one of them and think that this is the one to rule them all. For different instances of "this" depending on person and time.
> Whatever we put on top of the math to help us reason
The math is another one of those magic sprinkles you put on top of the NAND gates. If that's the one that gets you going, good for you! And it certainly has its uses. But don't get carried away.
> half-assed essays everyday ... "I learned OO and now I think of everything as objects"
Funny, I don't see any of those these days, maybe you're thinking of the 90s? Instead, I see a ton of blog posts claiming that "everything is category theory". ¯\_(ツ)_/¯. Very often, they then take an interesting problem, claim that it's really this other problem solved elegantly by FP. By stripping away all the properties of the problem that made it interesting and useful. But at least now it typechecks. Sigh.
Anyway...
> Love me some abstraction levels -- as long as they actually abstract things.
Yes. This is important. Really, really important. In my current thinking, there are currently three levels of this: just indirection, compression, and actual abstraction.
A lot of people think they've contributed abstraction when they're really just adding indirection. When you find an abstraction, it really is magic, but those are fairly rare. So you typically want to keep it as direct as possible, while trying to compress (trading off with directness as appropriate). The compression can point you towards an abstraction:
Another point is that the point of OO is not the program, it is the system created by the program. That is at the same time its greatest strength, and its greatest weakness, because it means that in order to build those systems, the program that constructs the system has to based on "side effects". But that's really a limitation of our call/return architectural pattern:
Thesis: Architecture Oriented Programming is what Object Oriented Programming wanted to be. It talks about instances and systems rather than classes and programs. It also lays a solid foundation for meta programming (connectors). -- https://twitter.com/mpweiher/status/1100283251531411457
So the systems are great (or can be, if you have a great systems builder), but the way we are forced to build them is pretty bad. FP tends to recognise the latter but not the former. See also:
I expect that the continuing popularity and necessity of distributed programming is going to finally put an end to the idea that "programming doesn't involve mathematics". It is just so much easier if you approach it from the sort of mathematical mindset demonstrated in this paper, and exactly as discussed, this is going to be a whole-stack affair, not something that can be isolated to just the data layer, despite our valiant and useful efforts on that front so far.
Programming this way is already easier in a lot of places, but the boundary between non-distributed systems that are possible without mathematical thinking and the ones that can only be done with mathematical thinking is easy to ignore. It tends to be specific niches where programs are only possible with this sort of thinking, and everything else can mostly be hit hard enough with a stick until it works well enough that we can can just get on with it. But as things get distributed, it just becomes insanely difficult to build even a "correct enough" system if you're not building it with some sort of math at least held in your mind as you design it. And "distributed" isn't just about servers, it's about multi-core CPUs, it's about GPUs, it's about CPUs and GPUs, it's about networks, just everywhere you look there's more and more "distributed" arising.
I think this has been the case for many decades. Lamport, the Turing award winning mathematician and computer scientist who wrote seminal algorithms and papers on distributed consensus, invented TLA+ for this express purpose.
A programming language is insufficiently expressive enough to specify and guarantee the safety and liveness properties of a multi-agent system.
I think formal methods and program synthesis are going to become the tools of choice in the long term for designing reliable systems in this vein.
I've been following the development of the CALM theorem and various attempts at programming languages that implement it's ideas (BloomL, etc) since 2010 at least -- I'd wished I knew about TLA+ back then. This paper is great... I hope it catches on more.
For an introduction and motivation to TLA+ I recommend Hillel Wayne's talk Everything About Distributed Systems is Terrible [0] and basically anything he has written on his blog about the TLA+.
I think following along with the TLA+ video course is really thorough and digestible once you've decided that you want to learn more [1].
You'll find the TLA+ Hyperbook is a useful reference and guide as you start specifying your own systems [2]. It has a number of examples and projects to work on.
Eventually if you want to go deep after you've decided it's worth it for yourself, get into Specifying Systems the text book and bible of TLA+ [3].
Program Synthesis has a lot of research in it and is starting to show promising results. I'd say that while the research has been going on for a long time it's still rather new to industrial use... for a motivating talk and explanation on what program synthesis is try this one: https://www.youtube.com/watch?v=HnOix9TFy1A
I'd actually suggest rather becoming fluent in modern Haskell.
Formal methods are great in their own way but for a lot of programming tasks they overshoot the mark. For those that do not, by all means, bring them in. But right now, a practical engineer can't ignore just how klunky they are and how badly they scale up. One may gaze longingly at them, and dream fondly of the future date when they are more practical, but they aren't there yet.
There is a reason the Haskell community is always banging on about monads and monoids and applicative functors and commutivity and associativity and identifying exactly how little power you need to accomplish a given task, and it's the same reason I was able to say in another comment thread that Haskell is probably the only current semi-popular language that could deploy the ideas in this paper as a library, and that library would actually be correct. (There are some languages that could deploy the library and it would be a cultural fit, like OCaml or Clojure, but you'd still be able to trivially use it incorrectly.)
Plus even if you still want to get into formal methods and program synthesis, "learn Haskell first" is still a fairly sensible stepping stone. It probably isn't strictly mathematically true, but formal methods are going to be more-or-less a superset of what modern Haskell teaches you anyhow. Both the Haskell language and the Haskell tooling will be easier, and pragmatic enough you can do a real project like "rewrite my website in it" or other pragmatic, attainable projects that the formal methods languages can't do yet.
Many formal methods have a much simpler mathematical foundation than Haskell. For example, for TLA+ the background includes mostly (basic) set theory, basic logic, and some additional bits of temporal logic.
Some methods do require a lot more math, or different math, but I'd say your statement is false for the majority of popular formal methods.
Indeed, I heard Lamport say at a talk (in response to a question) that he saw little or no applications of functional programming to the distributed systems problems he was interested in.
Formal verification is really hard because you have to use mathematical proof to show that a piece of software does what it should on all inputs and paths. You have to formalize the software, the desired goals, and prove a connection. Typically, you have to do this for software written in languages that aren't designed for verification. That makes it even harder.
This led me to recommend people start with so-called lightweight, formal methods that require less upfront learning with more automation. You'll see the value of formal methods, have more fun, and maybe use it on realistic problems. The best one to start with is Design-by-Contract invented by Meyer in the 1980's based on principles from 1960's. Contracts are simply precise, declarative descriptions of a module expects from callers, does during execution, and returns to the caller. Here's an overview even a project manager can find useful:
A number of us independently arrived at a specific use for contracts that's high ROI. You use contracts to specify intent, then automatically generate tests from them (replaces unit testing), and finally they can become runtime checks for automated testing (esp fuzzing). Contract-based tests or fuzzing with contract-based checks cause each failure to take you right to the source of the problem. Note that the test generation part has many names to Google: specification- or spec-based, model-based, and recently property-based testing. Hillel Wayne has great posts on contracts and test generation:
I recommend reading his whole blog. Anyway, the next tool is Alloy which is super-simple but with better tooling than most academic works. Hillel has examples on his blog. Davidk01 on Lobste.rs has been using it for a lot of things. He said, given its relational, he just uses it like he uses SQL:
Finally, one that will help you with protocols and concurrency is TLA+. Hillel has learntla.com up to teach you for free on top of blog examples. After a lot more time with it, he got to publish a book through Apress. I recommend it:
If you want a taste of proof, the SPARK Ada language was designed to make it easier. It's used in industry. Basically uses contract-like specs and supporting functions to feed "verification conditions" into automated solvers. You only have to prove what they can't but can always use runtime checks on those. There's a nice book teaching you how to use it plus Altran/Praxis' Correct-by-Construction method for low-defect software.
There's also Cleanroom: a 1980's method that combines semi-formal specs, simple use of language, hierarchical composition, and usage-based testing to cost-effectively get high quality software. Stavely has best intro to it:
For proofs I've been finding the leap into dependent-type theory to be very productive as many provers in this realm also double as practical programming languages allowing the possibility of shipping proof-carrying code.
My two favorites in this area are Lean [0] and Agda [1], the latter of which recently received a nice introductory text from Phil Wadler and Wen Kokke [2].
When people say that "programming doesn't involve mathematics", I think they really mean that programming doesn't involve hard mathematics. It's obvious that computers can only calculate, and meta-calculation is mathematics. However, most people have this notion that math is hard, and many programming tasks are easy.
Then, you take a look at a distributed system. Every task is substantially harder in a distributed system. The answers are not obvious, so folks are more willing to accept that those things are math.
I disagree, I think the future still holds separate data layer, but even less math because of how much easier it will be to reason about programs with DSLs built on solid foundations for distributed systems and how much harder it will be to keep doing things the old ways. Imagine a new language for databases, like SQL, but that can actually guarantee convergence and coordination freedom, where there are no transactions and weird behaviors with weird consistency violations and slow complex buggy implementations, but where everything just works and all you have to do is just be a bit more creative in mapping real world problems into that model. Not that the model can't be too expressive, it might even be able to express directly a lot of traditional non-monotonic stuff.
I think you'll find that if you encode these guarantees into the language level itself, that won't allow the programmers involved to stop thinking about the math. Either your language won't allow them to express the first thing they want to express, and they'll have to figure out how to do it with the proper mathematical primitives (and they'll complain up a storm about how useless your language is on the forums), or it will allow them to express it but yield a compile-time "Operation violates monotonicity constraints at line 88..." and they're going to have to understand that fairly deeply, or also end up dropping your language when they can't scale up their program because the "blunder around adding and removing sigils and annotations and sprinkling in 'unsafe' until the compiler shuts up" just doesn't scale very far beyond "complete the school assignment due tomorrow".
Look at Rust. Putting memory management concerns in the language front and center doesn't relieve you from having to thinking about them. It means you have to think even more deeply about them. Fortunately, this is the correct thing to do, and it get easier with practice, and is also easier to learn when it's so strongly highlighted instead of implied and buried. But if anything, it forces a more mathematical way of thinking on to you. (Reasoning about scopes is the foundation of the mathematical discipline of structured programming.)
I mostly agree, but I think any new mathematical understanding will be an implicit one. For example, in Rust, although the underlying type system is affine, almost no one explicitly reasons about programs using affine logic
We aren’t there yet, but we are approaching the day when substantially everyone with the capability of rebalancing a red/black tree on a whiteboard and an inclination to program is either programming or in some other way connected to the function. If a yet higher level of mathematical ability and knowledge becomes necessary, shortages are going to become even more acute. That’s good for some of us but bad for the world. The world would be better off if we could figure out a way to lower that bar, not raise it.
I'm being honest here I don't quite understand all the math involved . But from my rough skimming of the text looks like we are sacrificing the flexibility to change any input or minor design changes for guaranteed correctness .. am I wrong
It's more like "the flexibility to change any input" produces incorrect behavior, full stop.
When it comes to distributed systems, you have to be following some mathematical idea of how you're going to maintain some sort of consistency or correctness, or it just won't happen. The environment is just so much more hostile than the ones most people are used to. Most programming methodologies haven't grown up in the presence of an adversary that, for all practical purposes, is out to get you.
Another such discipline is the security domain, and we're having the same sorts of issues there, too. As a whole community, we're really just beginning to grapple with the idea of programming in an environment that has hostile, intelligent entities out to get you. There's still a contingent of people who insist it's not even an issue! Though it's shrinking fairly rapidly.
I respect what you are trying to say and im not denying that safety as a mathatical guarantee is a must .. but help me understand the full picture here how would such a system work with soft dev today when requirements change very quickly, sometimes small somtimes large ... for example our current sdlc looks like this
rinse and repeat, the cycle time might change but this is what happens .. is this ideal I don't know .. does this work, it does sufficiently enough
and so I disagree that people in the broader software dev community are willfully ignorant about security
maybe with the rise of AI we could do the design problem with the math bits in mind and build correct software every cycle but from my perspective the only way I see the system of mathematical design which is math intensive (hence resource intensive hence time intensive) is through the waterfall model of SDLC :).
I am willing to admit that I haven't been following the formal methods space that much and what I said might be complete of the mark, in any case, please feel free to correct me, I'm here to learn thanks.
The results seem very useful when reasoning about coordination in programs, but I'm questioning how practically useful these results are when applied as methods.
In the GC example, for instance, you do need to find the garbage nodes. Any trivial way of re-formulating that search in a monotonic way would likely destroy performance. My guess is that it's no silver bullet to solving problems in distributed systems.
>"Rather than micro-optimize protocols to protect race conditions in procedural code, modern distributed systems creativity often involves minimizing the use of such protocols."
Commerical databases 20 years ago had "shared nothing" designs precisely to avoid locks. The term "embarrassingly parallel" was introduced in the 1980s! The technique of multi-version concurrency control is widely used in databases and was first introduced in the 1980s. Maybe the authors forgot.
The post makes clear statements about how distributed consistency following CALM can regain lost performance by removing the need for coordination, without adding much complexity.
Keeping architectures monotonic (as defined by post and the linked paper) isn't difficult, especially if you allow yourself excess storage and (again, as discussed in the paper) deal with compaction of your data structures asynchronously, off the critical path.
In TypeScript for example you can strongly type your communication protocol (web requests/reponses, websockets, parent/worker ipc, etc) and get some guarantees at compile time. For example, you can guarantee that your state updates over the wire are eventually consistent.
The same goes of course for other algebraic type systems by encoding monotonicity in the form of static type assertions (Flow, Caml, Haskell...).