I think the idea is that the cache is so large that hot data won't be forced out by one-hit wonders. In this 2017 talk [1], the speaker says that Twitter's SLA depends on having a 99.9% hit rate. It is very common to have extremely over provisioned remote caching tier for popular sites. That makes eviction not as important and reducing their operational costs comes by purging expired data more proactively. Hence, memcached switched from away from relying on its LRU to discard expired entries to using a sweeper. Caffeine's approach, a timing wheel, was considered but dormando felt it was too much of an internal change for memcached and the sweeper could serve multiple purposes.
You might be interested in this thread [1] where I described an idea for how to incorporate the latency penalty into the eviction decision. A developer even hacked a prototype that showed promise. The problem is that there is not enough variety in the available trace data to be confident that a design isn't overly fit to a particular workload and doesn't generalize. As more data sets become available it will become possible to experiment with ideas and fix unexpected issues until a correct, simple, elegant design emerges.
The thing is that the math changes when the system is saturated. The closer you get to tipping over, the more each new request costs. I feel like I can clearly recall times when I had to make a Sophie's Choice between p50 and p95 times because of issues of this sort.
As the article mentions, Caffeine's approach is to monitor the workload and adapt to these phase changes. This stress test [1] demonstrates shifting back and forth between LRU and MRU request patterns, and the cache reconfiguring itself to maximize the hit rate. Unfortunately most policies are not adaptive or do it poorly.
Thankfully most workloads are a relatively consistent pattern, so it is an atypical worry. The algorithm designers usually have a target scenario, like cdn or database, so they generally skip reporting the low performing workloads. That may work for a research paper, but when providing a library we cannot know what our users workloads are nor should we expect engineers to invest in selecting the optimal algorithm. Caffeine's adaptivity removes this burden and broaden its applicability, and other language ecosystems have been slowly adopting similar ideas in their caching libraries.
Yep, no bots. A real bug not only means that I wasted someone else’s time, but reporting is a gift for an improvement. If a misunderstanding then it’s motivation that my project is used and deserves a generous reply. This perspective and treating as strictly a hobby, rather than as a criticism or demand for work, makes OSS feel more sustainable.
I tried to reimplement Linux’s algorithm in [1], but I cannot be sure about correctness. They adjust the fixed sizes at construction based on device’s total memory, so it varies if a phone or server. This fast trace simulation in the CI [2] may be informative (see DClock). Segmentation is very common, where algorithms differ by how they promote and how/if they adapt the sizes.
I've resorted to using Cmd-Shift-J (scrollback buffer) and grepping that, but its flaky about whether it will honor the command and emit a history file.
That’s in the works, where it adapts from 16mb to terabyte heaps. The current GCs have a max, with lazy allocation and ability to release back to the system periodically, but are not as system aware.
It will likely become available for application developers to use. At work, we use it to assist warehouse checkins by allowing the guard to take photos of the truck, paperwork, seal, etc and fill out the forms going in and out. If built-in then it can be run on-device, so over time a lot more workflows can be seamless.
[1] https://www.youtube.com/watch?v=kxMKnx__uso
reply