> [the GC] ...it's keeping you from the misery of manual memory management and that you're better to consider it an ally than a foe
I used to think this way, then I tried Rust. After C and C++, I never thought I'd want to give up the GC, but to avoid the issues that this article talks about, I wanted to go back to a systems language.
Oh, I misunderstood you. I thought you would write code without a single new or delete at all. So, everything is either living on the stack or using built-in allocations like vector and the like.
In modern C++ you do avoid it completely. Using things like std::make_unique<> and std::make_shared<> to allocate an object and stuff it into a smart pointer immediately.
Thanks. I always wanted to learn modern C++, but there was no such lecture at uni and in my spare time I was only able to fill the gaps partially, learning about the obvious things like the auto keyword and move constructors and the like.
Move constructors get a lot of press because they are new (and necessary for the underlying machinery of unique_ptr to work). But most people will never need to use it or know about it directly.
There are really two classes of C++ features, the basic use part which is fairly straightforward and nice to use. This is the API provided by the STL. Followed by the infrastructure stuff like templates, move semantics, SFINAE and other messy and non-obvious things. This latter part is much more complicated and still a mine field - but necessary for the STL to do what it does.
If you want to learn C++ get proficient in using the STL. The rest should only be learned once that is second nature.
> Step 3: Pooling large byte arrays used in serialisation/deserialisation
If you are using a explicit memory language, buffer pooling is still quite important - especially if you are doing any IO. You can avoid this to a certain degree by using fast malloc implementations.
Why not both? I want both GC for recursive data structures (lists, trees, graphs, etc.) and Rust-style resource management when deterministic reclamation matters (files, GUI objects, etc.).
I should've been clearer: For low-level programming, Rust is already fine as it is. What I want is a high-level language that offers some of the benefits of Rust (safe deterministic non-memory resource management), without the unnecessarily low-level features (sized types and overemphasis on unique ownership). Due to its C++ heritage, Rust has a tendency to conflate ownership (a high-level concern) with indirection (a low-level one), which is fine for the niche it targets, but too inflexible for general-purpose programming, IMO.
Neither OCaml nor Haskell has substructural types. There's no way to tell their type systems “you can't use a file after it's been closed”. In fact, what I want can be summarized as “ML plus substructural types”! (But preferably with Standard ML as the base point, since OCaml has way too many warts for my taste.)
The title is a bit misleading, the performance was memory/GC related and the specific in-memory cache that didn't work well were .Net data structures that didn't solve memory problems. He didn't try Redis or Memcache but ended up writing files to disk.
So he's essentially using the OS's disk cache as his cache, and tracking the disk files substituting for manual memory allocation.
In this situation on .NET, I wrote a resource manager that handed out handles to large byte arrays (in particular, ones that were too big to be collected outside of gen2, in the large object heap, which kicked in at 80kb at the time IIRC). The handles implemented IDisposable so taking care of handing back the byte array was no more or less tedious than any other resource you need to manage explicitly. The resource manager kept a hold of the arrays internally using a weak pointer so they could still be collected when gen2 collections actually happened, but allocating the buffers themselves would never cause gen2 collections in a steady state.
To turn that into a cache, you'd need another layer with keys, an eviction policy and an invalidation mechanism. I think it ought still be better than round-tripping to disk.
I wrote a different version of the resource manager that used P/Invoke helpers and unsafe code to allocate from unmanaged memory directly, but it didn't perform any better - it didn't relieve any pressure on the GC, which was 2% of CPU usage at full load in any case.
> So he's essentially using the OS's disk cache as his cache, and tracking the disk files substituting for manual memory allocation.
And that's often not the worst idea, because, when done correctly, this stuff never hits the disk when enough RAM is around.
Plus, life cycle management is done by the OS, not by you. It also tends to work better than "not at all" on memory pressure or if there isn't a lot of memory in the first place.
If you write a file to disk, then wait fifteen minutes before deleting it, chances are the OS will have flushed it to disk sometime in the meantime. It won't have to re-read it if there's still free memory, but it's extra load on disk io.
If you write a lot of them, then even if you have the memory, you may overrun the size limit of the write buffer and cause application stalls.
Writing to a ramdisk (e.g. tmpfs) is always an option, though.
I agree in a memory-rich environment it's far from the worst idea. However now you need to manage the files. You've just pushed the problem somewhere else.
Every OS has flags for this. Windows' CreateFile has a _TEMPORARY flag and most unixes have something like O_TMPFILE. These avoids flushing them to disk, even on a background writer timeout (Linux/Windows). Further, on many *nix systems /tmp and the like are often in-memory FSes.
I used to think this way, then I tried Rust. After C and C++, I never thought I'd want to give up the GC, but to avoid the issues that this article talks about, I wanted to go back to a systems language.
Now I want to use Rust for everything.