If you're just getting started with CRDTs I'd also recommend Seph's "5000x faster CRDTs" post[1] and also his "I was wrong. CRDTs are the future" post[2] for some nice captures of the practical challenges and opportunities in the space.
Thanks for the plug! I’m happy to answer any questions about this stuff. I’ve been noodling with collaborative editing for awhile now. It’s a fun space!
Very much appreciate your writing on the topic-- it gave me the practical context I was missing while trying to spin up with automerge and CRDTs in general.
I ended up using y.js in my personal react native notes app and it works wonderfully.
This is a clear and crisp way of differentiating delta based vs state based CRDTs.
But I don't think it's accurate enough. For example, the article claims delta based CRDTs don't use vector clocks. But even to decide if a set of events are concurrent it's necessary to attach vector stamps to every event. Maybe the author meant something else when he says vector clocks are not needed for op based CRDTs.
Also as mentioned in article it's true evergrowing event log is not such a bad idea with compression techniques and cheaper disks. But the problem with evergrowing datastructures is not just disk space. This data has to be loaded onto main memory to do anything useful with it. This data has to be transmitted across network to power SaaS apps. So stating that disks are cheaper hence evergrowing datastructures are fine - is an oversimplification.
I don't recall if it was git itself or github that bragged about inverting the storage structure so that it was simplest to read the current state, and HEAD~3 was determined by substracting the last 3 patches rather than fast forwarding from root to length - 3.
To do so requires that the patch function is a reversible function, which for git is not hard to achieve. I wonder how many CRDTs that is true for.
There are ways to transmit only a partial history for git, there should be ways to do the same for most CRDTs.
There absolutely is. I’m writing a paper at the moment about how we’ve done this in diamond types. The core idea is to store the original operations and the versions. And then reconstruct the crdt state on each peer as needed. Because most editing histories are mostly linear, it usually ends up faster than CRDTs in practice anyway. But you can then just send the latest document state and send historical operations on demand for merging and to access old document states.
I’ll post the paper when it’s ready. I think it’s a great approach - file sizes end up smaller. You don’t need the crdt in memory during editing and you can prune old operations.
Nice outline of the various techniques. We've built something in-between the operation-based and delta-based approaches for our offline-first multiplayer "IDE for notes/tasks" [1].
In our case we have a central server which periodically creates snapshots. Although we don't do that right now, if needed, it could delete older operations from the log for space reasons. Except for the fact that replicas encrypt their ops before they send it to the server (e2ee), the server is pretty much like any other replica, so if there is no central server I guess any or multiple replicas could also create snapshots instead.
"Normal" replicas which have been offline for a while can then get the last snapshot state S' with all operations since S'. Other than that, they simply broadcast operations and only keep their own latest version to which they apply incoming operations.
If you want to read about another implementation using Merkle CRDTs for practical local-first document database storage, I've been building a new database for about a year. I plan to share more but it's definitely an example of a Merkle CRDT, inspired by a lot of the same research in the Dolt paper.
The more abstract and stylized the image, the easier it is to avoid uncanny valley effects. Your images are spectacular; I'd love it if you divulged a bit of the secret sauce; perhaps in a blog post?
Oh, sure. I didn't think it'd be of interest. It's largely just knowing the names of visual components and art styles, but the generative AI takes it pretty far by itself if you're not too picky about what it gives you. But I'll write it up and let you know.
[1] https://josephg.com/blog/crdts-go-brrr/
[2] https://josephg.com/blog/crdts-are-the-future/