Hacker News new | past | comments | ask | show | jobs | submit login
Trade-offs between Different CRDTs (interjectedfuture.com)
145 points by iamwil on Jan 8, 2024 | hide | past | favorite | 19 comments



If you're just getting started with CRDTs I'd also recommend Seph's "5000x faster CRDTs" post[1] and also his "I was wrong. CRDTs are the future" post[2] for some nice captures of the practical challenges and opportunities in the space.

[1] https://josephg.com/blog/crdts-go-brrr/

[2] https://josephg.com/blog/crdts-are-the-future/


Thanks for the plug! I’m happy to answer any questions about this stuff. I’ve been noodling with collaborative editing for awhile now. It’s a fun space!


Very much appreciate your writing on the topic-- it gave me the practical context I was missing while trying to spin up with automerge and CRDTs in general.

I ended up using y.js in my personal react native notes app and it works wonderfully.


From the first link:

  They implemented lots of algorithms - CRDTs and OT algorithms and stuff.
What’s an “OT” algorithm in this context?


Operational Transform (an approach that requires a central server to coordinate edits)


This is a clear and crisp way of differentiating delta based vs state based CRDTs.

But I don't think it's accurate enough. For example, the article claims delta based CRDTs don't use vector clocks. But even to decide if a set of events are concurrent it's necessary to attach vector stamps to every event. Maybe the author meant something else when he says vector clocks are not needed for op based CRDTs.

Also as mentioned in article it's true evergrowing event log is not such a bad idea with compression techniques and cheaper disks. But the problem with evergrowing datastructures is not just disk space. This data has to be loaded onto main memory to do anything useful with it. This data has to be transmitted across network to power SaaS apps. So stating that disks are cheaper hence evergrowing datastructures are fine - is an oversimplification.


I don't recall if it was git itself or github that bragged about inverting the storage structure so that it was simplest to read the current state, and HEAD~3 was determined by substracting the last 3 patches rather than fast forwarding from root to length - 3.

To do so requires that the patch function is a reversible function, which for git is not hard to achieve. I wonder how many CRDTs that is true for.

There are ways to transmit only a partial history for git, there should be ways to do the same for most CRDTs.


There absolutely is. I’m writing a paper at the moment about how we’ve done this in diamond types. The core idea is to store the original operations and the versions. And then reconstruct the crdt state on each peer as needed. Because most editing histories are mostly linear, it usually ends up faster than CRDTs in practice anyway. But you can then just send the latest document state and send historical operations on demand for merging and to access old document states.

I’ll post the paper when it’s ready. I think it’s a great approach - file sizes end up smaller. You don’t need the crdt in memory during editing and you can prune old operations.


Oh that's clever


Yeah network bandwidth scaling with participant count is the the main headwind i.e. it's quadratic in most CRDTs at the moment AFAIKT


Nice outline of the various techniques. We've built something in-between the operation-based and delta-based approaches for our offline-first multiplayer "IDE for notes/tasks" [1].

In our case we have a central server which periodically creates snapshots. Although we don't do that right now, if needed, it could delete older operations from the log for space reasons. Except for the fact that replicas encrypt their ops before they send it to the server (e2ee), the server is pretty much like any other replica, so if there is no central server I guess any or multiple replicas could also create snapshots instead.

"Normal" replicas which have been offline for a while can then get the last snapshot state S' with all operations since S'. Other than that, they simply broadcast operations and only keep their own latest version to which they apply incoming operations.

[1] https://thymer.com


Apparently he's going to write about Merkle CRDTs next, but if you can't wait:

https://www.dolthub.com/blog/2022-06-27-prolly-chunker/


If you want to read about another implementation using Merkle CRDTs for practical local-first document database storage, I've been building a new database for about a year. I plan to share more but it's definitely an example of a Merkle CRDT, inspired by a lot of the same research in the Dolt paper.

Architecture - https://use-fireproof.com/docs/architecture CRDT file format - https://fireproof.storage/posts/remote-access-crdt-wrapped-m...


And if you really can't wait, I made the mistake of thinking Prolly Trees weren't unicit, but figuring out that they were!

https://interjectedfuture.com/lab-notes/lab-note-030-the-tri...


Wow, I hadn't seen this follow-up post! Cool!

(btw I really love your masthead graphics)


Thanks for pointing out the discrepancy! Some people don't like generative images, but they're pretty fun to generate.


The more abstract and stylized the image, the easier it is to avoid uncanny valley effects. Your images are spectacular; I'd love it if you divulged a bit of the secret sauce; perhaps in a blog post?


Oh, sure. I didn't think it'd be of interest. It's largely just knowing the names of visual components and art styles, but the generative AI takes it pretty far by itself if you're not too picky about what it gives you. But I'll write it up and let you know.


Will those comment eventually update once the blog is appended? :-)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: