Hacker News new | past | comments | ask | show | jobs | submit login

It's similar to a per user log - a user has an entry from time X, time Y and so on and you're likely only interested in the last N entries. Let's say for this case that N = 20.

This is a near perfect use case for Redis lists. I say near perfect as you still have the problem that if you have an incredibly active user (which we have many of) with thousands of log entries (of which they'd only ever likely be looking at the last few hundred) as you still need to keep that entire chunk in memory. It'd be great if you could archive past the last 1000 onto disk - you can do that with client side logic but it's still a bit of a pain and then you need to hope that Redis knows that those "historical" lists don't need to clog up the cache.

MySQL isn't up to the task in that case as if those entries aren't cached (which they likely aren't) then it needs to retrieve 20 rows from disk. Those rows on disk aren't sequential and with a single access on disk being 10ms that's 0.2 seconds per query or a total of 5 queries per second. The project additionally runs on EC2 so disk IO (especially random reads) are temperamental at best. If the data was grouped or contiguous (as in the case of Redis Diskstore, BigTable style DBs etc) then you may be able to get upwards of 100 queries per second if a random read is 10 ms.

Even if we were using Redis or memcache in front of MySQL and requests came in only as we could handle them we'd only be able to serve ~430,000 requests per day. We have more users than that, the queries don't come in consistently (i.e. we still have to worry about peaks) and that's additionally not worrying about cache invalidation. Due to how slow this is it'd be nice to take snapshots to make Redis a persistent cache but then you hit the same initial problems I mentioned before.

The issue with using Redis as the primary datastore is related to the persistence issues I mentioned previously. Saves can end up taking a long time and chew up a lot more memory than I'm comfortable with. This is complicated by the fact that it'd be preferable to have some chunk of the cold dataset stored on disk but the Virtual Memory persistence option is not really suited and additionally makes saves even worse (as now you're reading from a slow disk IO device [the VM files] to save to a different slow disk IO device [the database backup] whilst trying to serve the occasional cold query from the VM).

Diskstore would solve the problems of persistent storage and slow disk IO (as the SAVE wouldn't thrash the VM disk whilst Redis was trying to use the VM disk to serve cold queries) but unfortunately we can't wait for it. I truly am looking forward to it being released however and will look to it for other future projects =]




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: