I love Redis a great deal but I can't wait for the persistence issues to be work...

davidhollander · on Feb 27, 2011

>I can't wait for the persistence issues to be worked out.

Why use an in memory database for persistence instead of for cacheing? Considering that RAM is the scarcest and thus most valuable resource on nearly every server, in memory DBs like Redis are better used for storing already rendered data ready for output rather than every bit of raw data that always needs to be duplicated or logged on disk at the end of the day anyway. Persistence==disk!=memory.

>Redis Diskstore is something I'm eagerly waiting for

If your data can be represented using hashmaps (unordered) and b+trees (ordered), check out TokyoCabinet for on-disk persistance. It has the fastest on-disk hashmaps and b+trees (of non-fixed size) and has been around for a while.

Smerity · on Feb 27, 2011

The point is Redis is beginning to move away from being just an in-memory database. See my other comment for a more detailed answer as to the use case.

Persistent caches are an important commodity for many websites. For some services trying to handle standard traffic patterns with an empty cache is suicide and can cause cascading failures across the board. Without a strong persistence solution Redis can't be used here. antirez mentioned he considered it a strange move for Reddit to use to Cassandra and not Redis as a persistent cache[1] and I think the issues with Redis persistence may well have caused that.

Additionally I still think there's reasonable ground for a database that's primarily in-memory but drops least used data off to disk. The vast majority of the web follows a Zipfian/long tail distribution so although your "working dataset" can be far larger than RAM your actual "active working dataset" can fit in there. Why trade away the advantages of an in-memory data structure driven database when almost all your queries can be satisfied in this manner?

[1] http://www.reddit.com/r/programming/comments/bcqhi/reddits_n...

davidhollander · on Feb 27, 2011

>for a database that's primarily in-memory but drops least used data off to disk

>so although your "working dataset" can be far larger than RAM your actual "active working dataset" can fit in there

To me that's the exact same pattern as cacheing, merely stated in reverse. If you want full persistence, everything's going to have to hit the disk anyway no matter how you phrase it and how and when you store it. In regards to your user logs problem stated below, that sounds easy to parallelize. I would hash+modulo the username to a data server number, on the selected data server traverse a B+ tree of dates\entries in order, and buffer or stream until 20 entries matching the usernames have been retrieved in response. High performance and fully persistent.

Waiting for a new memory+disk database pattern that does everything more intelligently and faster than every other disk+memory pattern seems like procrastination of parallelization, which is the unavoidable long term answer to these problems.

simonw · on Feb 27, 2011

Just out of interest, what would you have been using Redis for in your recent project?

I'm still using MySQL as my primary store, with a Redis denormalised layer on the side for set intersections, random item selection and the like.

Smerity · on Feb 27, 2011

It's similar to a per user log - a user has an entry from time X, time Y and so on and you're likely only interested in the last N entries. Let's say for this case that N = 20.

This is a near perfect use case for Redis lists. I say near perfect as you still have the problem that if you have an incredibly active user (which we have many of) with thousands of log entries (of which they'd only ever likely be looking at the last few hundred) as you still need to keep that entire chunk in memory. It'd be great if you could archive past the last 1000 onto disk - you can do that with client side logic but it's still a bit of a pain and then you need to hope that Redis knows that those "historical" lists don't need to clog up the cache.

MySQL isn't up to the task in that case as if those entries aren't cached (which they likely aren't) then it needs to retrieve 20 rows from disk. Those rows on disk aren't sequential and with a single access on disk being 10ms that's 0.2 seconds per query or a total of 5 queries per second. The project additionally runs on EC2 so disk IO (especially random reads) are temperamental at best. If the data was grouped or contiguous (as in the case of Redis Diskstore, BigTable style DBs etc) then you may be able to get upwards of 100 queries per second if a random read is 10 ms.

Even if we were using Redis or memcache in front of MySQL and requests came in only as we could handle them we'd only be able to serve ~430,000 requests per day. We have more users than that, the queries don't come in consistently (i.e. we still have to worry about peaks) and that's additionally not worrying about cache invalidation. Due to how slow this is it'd be nice to take snapshots to make Redis a persistent cache but then you hit the same initial problems I mentioned before.

The issue with using Redis as the primary datastore is related to the persistence issues I mentioned previously. Saves can end up taking a long time and chew up a lot more memory than I'm comfortable with. This is complicated by the fact that it'd be preferable to have some chunk of the cold dataset stored on disk but the Virtual Memory persistence option is not really suited and additionally makes saves even worse (as now you're reading from a slow disk IO device [the VM files] to save to a different slow disk IO device [the database backup] whilst trying to serve the occasional cold query from the VM).

Diskstore would solve the problems of persistent storage and slow disk IO (as the SAVE wouldn't thrash the VM disk whilst Redis was trying to use the VM disk to serve cold queries) but unfortunately we can't wait for it. I truly am looking forward to it being released however and will look to it for other future projects =]

ichilton · on Feb 27, 2011

> I love Redis a great deal but I can't wait for the persistence issues to be worked out.

Please could you provide links detailing these issues further?

Thanks.