The demise of eventual consistency

haberman · on June 22, 2014

> A system that keeps some, but not all, of its nodes able to read and write during a partition is not available in the CAP sense but is still available in the sense that clients can talk to the nodes that are still connected.

Sure, as long as the clients can reach the nodes that are "still connected." But clients can get partitioned from server nodes too. For example if there was a complete fiber cut across the Atlantic, both clients and servers in the US would be partitioned from clients and servers in Europe. Whichever side has the majority of replicas (say the US) gets to keep operating. You can try to say that the service is still "available", but that doesn't help the clients in Europe.

If the client is partitioned from the majority of replicas, it's game over. At that point, the system has to give up either consistency or availability from the perspective of that client.

And of course, if the majority of replicas suddenly goes down completely because of power failure or something like that, then the system truly is unavailable no matter where you put a client. There is no way for the surviving nodes to know that the dead nodes aren't actually alive and still accepting writes.

> New distributed, consistent systems like Google Spanner concretely demonstrate the falsity of a trade-off between strong consistency and high availability.

I am pretty sure that even Spanner becomes unavailable if a majority of replicas go down (or the client is partitioned from them).

jasode · on June 22, 2014

>If the client is partitioned from the majority of replicas, it's game over. At that point, the system has to give up either consistency or availability from the perspective of that client.

It sounds like the article's author simply shifted the problem from "eventually consistent" to "eventually 100% available to 100% of the clients" -- e.g. when fiber optical cable is repaired.

That can be a perfectly fine tradeoff but it would be clearer if the author spelled that out explicitly.

I believe his more notable point is that the current crop of immature databases push the logical reasoning about "eventually consistent" too far up into application layer which in turn causes unnecessary pain for developers. I think focusing on this one concept would be a better blog post because it addresses many real-world requirements of "eventual consistency".

haberman · on June 22, 2014

Even under that analysis, a consistent system is completely unavailable (0% of clients can get service) when a majority of replicas suddenly go down completely.

The point about eventual consistency pushing inconsistency too high in the stack makes sense to me though. The spanner paper makes a similar argument.

mtdewcmu · on June 22, 2014

I thought that the original justification for eventual consistency was that, for some kinds of data, availability is much more important than consistency. It could be that, in reality, that kind of data doesn't exist, because any data worth keeping is worth the trouble of keeping consistent. Or, it could be that the use case exists, but it's uncommon and seldom worth maintaining a separate database system if you are not Facebook. The article didn't make those arguments, though. It argues that weak guarantees by the database make client code harder to write, which ought to be kind of obvious, if consistency is what you need.

rdtsc · on June 22, 2014

> It could be that, in reality, that kind of data doesn't exist, because any data worth keeping is worth the trouble of keeping consistent

Not necessarily. What is important is not "dropping" the data on the floor in an unpredictable fashion. The sane eventually consistent models will present the conflicted versions to the user. (Optionally all sides picking the same one value, as winner, but never throwing away others).

That is what Riak does in its correct (sadly non-default) configuration and that is what CouchDB does.

This bubbling up of eventual consistency to the very top layer is the correct behavior. The database might find that both you and your friend withdrew $100 from the same account. Now that account is in a negative balance perhaps. But the important thing is it keeps both transactions. So something above can decide to pick a winning one, not pick any and cancel both, to use maybe a timestamp. Or to cancel the account because of possible fraud.

mtdewcmu · on June 22, 2014

>> New distributed, consistent systems like Google Spanner concretely demonstrate the falsity of a trade-off between strong consistency and high availability.

>I am pretty sure that even Spanner becomes unavailable if a majority of replicas go down (or the client is partitioned from them).

It only claims high availability, not perfect availability.

Any server can, for instance, be vaporized by a nuclear attack. In that case, it won't be available. The software can't help.

rdtsc · on June 22, 2014

> Building the complex, scalable systems demanded by todays highly connected world with such weak guarantees is exceptionally difficult.

So is building a complex scalable system that break the laws of physics or theorems. The world is eventually consistent. Some business domains can handle that. Even some banking operations are eventually consistent. You can get to an ATM in Australia and one in US at the same time and overdraw your account. That is eventually consistency. Banks see it better that way then expect you to wait for half an hour until they can decide on a global shared consistent state of your account.

> Essentially, that engineer needs to manually do the hard work to ensure that multiple clients don’t step on each other’s toes and deal with stale data.

Sometimes that is not that hard. Sometimes business logic allows for a custom (user-based) reconciliation of conflicts. In some cases that engineer has to rely on magic unicorns that another engineer (who build the DB) put it in the product to make it beat the CAP theorem. Or the administrator needs to handle global restart of all the servers because the global cluster has become unavailable because say one node has blown up or got partitioned. That is not _always_ in all cases better than the case of eventual consistency.

> Google addressed the pain points of eventual consistency in a recent paper on its F1 database

So the answer is installing expensive GPS receiver on the roof of your data-centers and running wires down to the cluster of machines? Yeah that works better in some cases. But it is not the best answer always.

> Vendors should stop hiding behind the CAP theorem as a justification for eventual consistency.

Vendors should lie and market-speak their way out of a theorem? This has been done before with other database products. So maybe FoundationDB is choosing that path to follow...

> Dave Rosenthal is a co-founder of FoundationDB.

I don't know. As I often say, sometimes the biggest enemies of a an idea are its most ardent supporters.

bane · on June 22, 2014

> So is building a complex scalable system that break the laws of physics or theorems. The world is eventually consistent.

You kind of hit me during a moment with this. You're basically describing the theory of relativity. I suppose in a way, the ToR could be rephrased to be a theoretical upper limit on information propagation with ramifications for ensuring consistency in local state.

Dave_Rosenthal · on June 22, 2014

"beat the CAP theorem" ... "break the laws of physics" ... "lie and market-speak their way out of a theorem"

Strong words, but the article doesn't advocate any of the above. It advocates choosing consistency over availability and lays out the reasons why that is a good choice. Choosing consistency is hardly a radical idea and many of the most popular NoSQL databases actually choose consistency (e.g. HBase, MongoDB).

Indeed even Riak, the biggest advocate of eventual consistency, is working hard to build strong consistency into Riak 2.0.

rdtsc · on June 22, 2014

But choosing consistency over availability in all cases is not an answer, one is not a strictly superior choice over another, as he article proposes.

At one level striving for consistency in a large distributed system is fighting with the laws of physics. It has been attempted and so far Google probably has a better handle on it but it requires tight coupling with a time synchronization service.

> Choosing consistency is hardly a radical idea

_Claiming_ to choose it is not radical. Actually doing it, is. If you read the "Call me maybe" you'd see that most of supposed consistent databases fail. Granted that measures partition tolerance but at the same time usually that is not a choice to be made (according to the author Aphyr).

> Indeed even Riak, the biggest advocate of eventual consistency, is working hard to build strong consistency into Riak 2.0.

That is what I've heard. Consistency for some data is the right choice, no argument there. At the same time they are _not_ throwing away or switching away from eventual consistency.

In fact they are the database that when preserving conflicted siblings actually did better than most in "Call me maybe" series. CouchDB is another database that does this right. It preserves merge conflicts explicitly during replication. Users can chose to ignore them but it doesn't arbitrary throw data away. It is not in the series because replication and cluster topology is user defined and managed so there is no default, single distributed cluster setup.

Most of the failures of databases come from trying to sweep under the rug effects of conflicts in eventually consistency. Even Riak's last-write-wins did that. MongoDB and others failed too. That doesn't mean eventual consistency is flawed. Eventual consistency is a physical reality, and in some case it is also a viable business application pattern.

mtdewcmu · on June 22, 2014

This reminds me of a paper I read in college, "The Dangers of Replication."[1] It lays out the fundamental limits on distributed updates, and it's a good one to read if you haven't seen it before.

[1] http://www.cs.cmu.edu/~natassa/courses/15-823/current/papers...

rdtsc · on June 22, 2014

That is a pretty good paper. 1996. What it talks about is relevant. CAP wasn't talked about and distributed databases where not the topic of casual conversations between developers.

It mentions Lotus Notes. As a piece of trivia, the creator of CouchDB (Damien Katz) originally worked on Lotus Notes (at IBM). Then created CouchDB in his spare time. I believe its design is influenced by Lotus Notes quite a bit.

mtdewcmu · on June 22, 2014

It wasn't a new paper when I took the class, either. None of the papers I read for that class were new. I took the fact that it was assigned reading to mean that the professor considered it timeless.

mtdewcmu · on June 22, 2014

>>has to rely on magic unicorns that another engineer (who build the DB) put it in the product

Where can I get these?

rdtsc · on June 22, 2014

Pick any failed cases from "Call Me Maybe" series. In general unicorns materialize better if you read only marketing materials and follow PR pieces in picking your databases. Then they tend to disappear when written data is silently dropped on the floor.

mtdewcmu · on June 22, 2014

This was written by somebody with something to sell, and it sounds like it. I was hoping he would explain the misreading of the CAP theorem, but it ends by promising that future databases will be more powerful. Eventually, I guess?

zimbatm · on June 21, 2014

Would love to see a call-me-maybe article on FoundationDB...

rdtsc · on June 22, 2014

Interestingly, one of the few databases that did better under his test was Riak, with explicit sibling handling. It is funny because it is an eventually consistent system.

http://aphyr.com/posts/285-call-me-maybe-riak

see at the end:

   ---
   Writes completed in 80.918 seconds

   2000 total
   1948 acknowledged
   2000 survivors
   All 2000 writes succeeded. :-D
   ---

Dave_Rosenthal · on June 22, 2014

Not Aphyr himself, but FoundationDB did it: http://blog.foundationdb.com/call-me-maybe-foundationdb-vs-j...

mtdewcmu · on June 22, 2014

I gave you my number, but you saved it in MongoDB. Call me, maybe?

zimbatm · on June 22, 2014

MongoDB did get tested :) http://aphyr.com/tags/jepsen

yohanatan · on June 21, 2014

'Eventually consistent' is not at all a "new term".