Xi editor, featured here before [1][2], used CRDTs as a core part of the initial design. In a retrospective [3] the author found -- paraphrasing -- that the kinds of answers given by CRDTs don't map well onto the problems of making a text editor.
That makes me wonder, if not text editing (fair enough) what are CRDTs good at representing? It seems this free online web course (?!) is trying to teach the abstractions, maybe that would shed some light.
More generally, the key problem seems to be something like you need to preserve the lower level behavior (merging rules) in the semantics of the abstraction (CRDT behavior). That process seems to be an unsolved problem, though I wouldn't be surprised if it's related to other unsolved/unsolvable problems like the halting problem. Any interesting reads?
After studying [3], I came to the conclusion that the specific CRDTs they attempted to use for Xi editor might be too low level to be useful in building a text editor directly, but that this is not an intrinsic shortcoming of CRDTs. Rather, the kind of automerging behavior of the CRDTs they used didn't match the design needs of the editor.
Anyhow, I've come to see CRDTs as very low level primitives that can assist in creating decentralized applications. Their strength is not that they automagically solve the inherently subjective problem of merging asynchronously edited, divergent data. That problem is more of a design problem that needs to be solved on a case by case basis. The strength of CRDTs is in that they have extremely well defined behavior and aren't fundamentally reliant on a central coordinator or leader. Of course you can design in a leader if you want, or multiple leaders. But that has nothing to do with CRDTs. So I imagine them being more useful in composing higher level data structures and smooth interfaces for users to make inherently difficult decisions about merge conflicts themselves.
My favorite kind of CRDTs are "delta state CRDTs". They're very cool. I wish the paper about them was easier to understand.
That's actually a reasonable position. We solved some problems, and had ideas for others (for example, overloading "priority" to position the cursor after things like inserting whitespace for indentation), but had a list of other things we wanted to do, including "soft spans" (issue #244 in the repo) that we never quite figured out how to do.
Ultimately, I'm not very smart, and someone who is could probably figure out how to map all these other code editing problems to CRDT data structures. Once that's done, the result should be compelling, because CRDT lets you take advantage of low-latency local connections rather than requiring everything to get serialized on a central server. It's research I would love to see being carried forward.
"The strength of CRDTs is in that they have extremely well defined behavior and aren't fundamentally reliant on a central coordinator or leader."
Very much agree with this. Coordination is a super hard problem in distributed systems, and becomes enormously more difficult if you assume adversaries actively trying to interfere with your system.
Good distributed systems engineering is heavily focused on eliminating complexity and solving for every possible error case. CRDTs have a low number of error cases and allow nodes to interact together in clean, non-complex ways.
Distributed systems engineers like reaching for CRDTs as a tool because they can take your engineering costs down by an order of magnitude.
In the elixir community CRDTs are well known for various HA; AP distributed state strategies, Phoenix Presence uses CRDTs to track processes and broadcast them over pubsub channels. This makes it trivial to represent "user state" in chat rooms, Sophie DiBenedetto describes using it for writing a trivial chat app: https://www.youtube.com/watch?v=7L5xuTbXtbI
For something a bit wilder, you can track your state machines across a distributed cluster and be able to pass and preserve state across the cluster using CRDTs, as is done in Dan Azuma's (live) demo of a tank game: https://www.youtube.com/watch?v=nLApFANtkHs (worth it for the demo at 29m)
AP = AP in the CAP theorem, which states that in a distributed system you must either choose consistent and Partition tolerant (CP) or Available and Partition tolerant (AP), sort of the "heisenberg uncertainty theorem" of distributed system (you can't have CA because for practical purposes the "distributed" constraint implies "partition tolerant").
HA = "Always available". Answers may be stale, but you'll always get an answers.
CP = "Always consistent". You may not always get an answer, but if you do, it's consistent.
You can't have both in a distributed system, because distributed systems are on networks, networks are imperfect, and therefor you get "partitions' (a silly term for failures). The question is, given one of these failures, do you block and wait for all nodes to agree on what's "right", or do you just respond with the best information you have available.
Partitions are a particular sort of failure where everyone thinks the problem is on the other end.
If I crash, I’m out. I can’t interfere with anything or anybody for long. If others think I crashed but I’m still running, I can become some sort of ghost haunting the system. Paranormal activity in an environment built entirely on logic is chaos.
A system built for partitions has to consider the problem where I’m talking but nobody can hear me, what do I do to sort it out, and what can I productively do until then, like queue up small purchases until my connection to Visa is available again.
AP : Available under Partitioned conditions (when a n/w partition occurs in a distributed system you still get answers to your queries, but answers may be stale)
[HA means just highly available]
CP : Consistent under partitioned conditions (when a n/w partition occurs in a distributed system you may not get answers for all your queries, but when you do get answers, they are not stale and are causally consistent)
High availability and available/partition tolerant.
These are concerns that are of particular interest when running a set of servers that communicate with each other directly, which is common in Elixir/Erlang due to this being supported by the OTP libraries.
The AP comes from CAP theorem, which is better documented on wikipedia than I could give justice to.
If you think what CRDTs can do, they enable consistently merging state over distributed system, even after partition.
Usually one person edits one text snippet at time so CRDT might not be huge benefit. I'm not familiar what Xi did differently than for example git does when merging changes, but maybe CRDTs don't usually help there.
So what are CRDTs good for? One thing comes to mind right away.
Growing distributed simple data like web analytics. It usually has sets of users and growing counters like views, clicks and other actions.
However many platforms actually collect all events and metadata as huge event streams/logs, so would you actually use CRDT as distributed aggregated values or just collect all logs and then aggregate values?
In this context, I’ve enjoyed a series of papers on CRTDs and Operational Transformation. The differences and similarities. The similarities might be greater than most are aware, and the differences more specific. This is the latest paper: https://arxiv.org/pdf/1905.01518.pdf
Real Differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors – Proceedings of the ACM on Human-Computer Interaction 2020 – Chengzheng Sun, David Sun, Agustina Ng, Weiwei Cai, Bryden Cho
That makes me wonder, if not text editing (fair enough) what are CRDTs good at representing? It seems this free online web course (?!) is trying to teach the abstractions, maybe that would shed some light.
More generally, the key problem seems to be something like you need to preserve the lower level behavior (merging rules) in the semantics of the abstraction (CRDT behavior). That process seems to be an unsolved problem, though I wouldn't be surprised if it's related to other unsolved/unsolvable problems like the halting problem. Any interesting reads?
[1]: https://hn.algolia.com/?query=xi%20editor
[2]: https://news.ycombinator.com/item?id=23663878
[3]: https://github.com/xi-editor/xi-editor/issues/1187#issuecomm...