When a disk cache performs better than an in-memory cache

bluejekyll · on Oct 3, 2016

> [the GC] ...it's keeping you from the misery of manual memory management and that you're better to consider it an ally than a foe

I used to think this way, then I tried Rust. After C and C++, I never thought I'd want to give up the GC, but to avoid the issues that this article talks about, I wanted to go back to a systems language.

Now I want to use Rust for everything.

saynsedit · on Oct 3, 2016

RAII in Rust comes from C++. If you're using new/delete in C++, you're doing it wrong.

catnaroek · on Oct 3, 2016

Sure, but RAII alone doesn't make resource management safe. It's the combination of RAII and lifetime checking that achieves it.

rcfox · on Oct 3, 2016

Not everything can live on the stack.

If you're not using new/delete in C++, you're probably not doing something very complicated. (And hey, that's not necessarily a bad thing!)

netheril96 · on Oct 3, 2016

The resource manager (like std::vector or std::unique_ptr) can certainly live on the stack when the data it manages resides on the heap.

saynsedit · on Oct 3, 2016

RAII can wrap heap-allocated data. Also bss-allocated objects are common.

Kenji · on Oct 3, 2016

Uhh... RAII is perfectly compatible with new/delete. You just have to put all new calls in the constructor and all delete calls in the destructor.

saynsedit · on Oct 3, 2016

Yes Kenji... You put new/delete in constructor/destructor so that you don't have to call new/delete manually.

Kenji · on Oct 3, 2016

Oh, I misunderstood you. I thought you would write code without a single new or delete at all. So, everything is either living on the stack or using built-in allocations like vector and the like.

slededit · on Oct 3, 2016

In modern C++ you do avoid it completely. Using things like std::make_unique<> and std::make_shared<> to allocate an object and stuff it into a smart pointer immediately.

Kenji · on Oct 3, 2016

Thanks. I always wanted to learn modern C++, but there was no such lecture at uni and in my spare time I was only able to fill the gaps partially, learning about the obvious things like the auto keyword and move constructors and the like.

slededit · on Oct 3, 2016

Move constructors get a lot of press because they are new (and necessary for the underlying machinery of unique_ptr to work). But most people will never need to use it or know about it directly.

There are really two classes of C++ features, the basic use part which is fairly straightforward and nice to use. This is the API provided by the STL. Followed by the infrastructure stuff like templates, move semantics, SFINAE and other messy and non-obvious things. This latter part is much more complicated and still a mine field - but necessary for the STL to do what it does.

If you want to learn C++ get proficient in using the STL. The rest should only be learned once that is second nature.

zamalek · on Oct 3, 2016

> avoid the issues that this article talks about

Slight caveat.

> Step 3: Pooling large byte arrays used in serialisation/deserialisation

If you are using a explicit memory language, buffer pooling is still quite important - especially if you are doing any IO. You can avoid this to a certain degree by using fast malloc implementations.

catnaroek · on Oct 3, 2016

Why not both? I want both GC for recursive data structures (lists, trees, graphs, etc.) and Rust-style resource management when deterministic reclamation matters (files, GUI objects, etc.).

bluejekyll · on Oct 3, 2016

This is something that is in discussion, though I don't know the status of any of the discussions:

https://internals.rust-lang.org/t/on-native-allocations-poin...

https://github.com/rust-lang/rfcs/blob/bdbe73d1be948dd925c6b...

catnaroek · on Oct 3, 2016

I should've been clearer: For low-level programming, Rust is already fine as it is. What I want is a high-level language that offers some of the benefits of Rust (safe deterministic non-memory resource management), without the unnecessarily low-level features (sized types and overemphasis on unique ownership). Due to its C++ heritage, Rust has a tendency to conflate ownership (a high-level concern) with indirection (a low-level one), which is fine for the niche it targets, but too inflexible for general-purpose programming, IMO.

fridsun · on Oct 3, 2016

OCaml or Haskell.

catnaroek · on Oct 3, 2016

Neither OCaml nor Haskell has substructural types. There's no way to tell their type systems “you can't use a file after it's been closed”. In fact, what I want can be summarized as “ML plus substructural types”! (But preferably with Standard ML as the base point, since OCaml has way too many warts for my taste.)

mamp · on Oct 3, 2016

The title is a bit misleading, the performance was memory/GC related and the specific in-memory cache that didn't work well were .Net data structures that didn't solve memory problems. He didn't try Redis or Memcache but ended up writing files to disk.

barrkel · on Oct 3, 2016

So he's essentially using the OS's disk cache as his cache, and tracking the disk files substituting for manual memory allocation.

In this situation on .NET, I wrote a resource manager that handed out handles to large byte arrays (in particular, ones that were too big to be collected outside of gen2, in the large object heap, which kicked in at 80kb at the time IIRC). The handles implemented IDisposable so taking care of handing back the byte array was no more or less tedious than any other resource you need to manage explicitly. The resource manager kept a hold of the arrays internally using a weak pointer so they could still be collected when gen2 collections actually happened, but allocating the buffers themselves would never cause gen2 collections in a steady state.

To turn that into a cache, you'd need another layer with keys, an eviction policy and an invalidation mechanism. I think it ought still be better than round-tripping to disk.

I wrote a different version of the resource manager that used P/Invoke helpers and unsafe code to allocate from unmanaged memory directly, but it didn't perform any better - it didn't relieve any pressure on the GC, which was 2% of CPU usage at full load in any case.

dom0 · on Oct 3, 2016

> So he's essentially using the OS's disk cache as his cache, and tracking the disk files substituting for manual memory allocation.

And that's often not the worst idea, because, when done correctly, this stuff never hits the disk when enough RAM is around.

Plus, life cycle management is done by the OS, not by you. It also tends to work better than "not at all" on memory pressure or if there isn't a lot of memory in the first place.

Filligree · on Oct 3, 2016

If you write a file to disk, then wait fifteen minutes before deleting it, chances are the OS will have flushed it to disk sometime in the meantime. It won't have to re-read it if there's still free memory, but it's extra load on disk io.

If you write a lot of them, then even if you have the memory, you may overrun the size limit of the write buffer and cause application stalls.

Writing to a ramdisk (e.g. tmpfs) is always an option, though.

dom0 · on Oct 3, 2016

Every OS has flags for this. Windows' CreateFile has a _TEMPORARY flag and most unixes have something like O_TMPFILE.

barrkel · on Oct 3, 2016

It'll be flushed to disk within seconds.

I agree in a memory-rich environment it's far from the worst idea. However now you need to manage the files. You've just pushed the problem somewhere else.

dom0 · on Oct 3, 2016

Every OS has flags for this. Windows' CreateFile has a _TEMPORARY flag and most unixes have something like O_TMPFILE. These avoids flushing them to disk, even on a background writer timeout (Linux/Windows). Further, on many *nix systems /tmp and the like are often in-memory FSes.

barrkel · on Oct 3, 2016

.net doesn't expose it.

dom0 · on Oct 3, 2016

That sucks :(

inmemory_net · on Oct 3, 2016

Did you try do an LOH Garbage Collection. We have a dot net in memory database, and do a scheduled one every 60 minutes or so.