Hacker News new | past | comments | ask | show | jobs | submit login

>for a database that's primarily in-memory but drops least used data off to disk

>so although your "working dataset" can be far larger than RAM your actual "active working dataset" can fit in there

To me that's the exact same pattern as cacheing, merely stated in reverse. If you want full persistence, everything's going to have to hit the disk anyway no matter how you phrase it and how and when you store it. In regards to your user logs problem stated below, that sounds easy to parallelize. I would hash+modulo the username to a data server number, on the selected data server traverse a B+ tree of dates\entries in order, and buffer or stream until 20 entries matching the usernames have been retrieved in response. High performance and fully persistent.

Waiting for a new memory+disk database pattern that does everything more intelligently and faster than every other disk+memory pattern seems like procrastination of parallelization, which is the unavoidable long term answer to these problems.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: