No, unified memory usually means the CPU and GPU (and miscellaneous things like the NPU) all use the same physical pool of RAM and moving data between them is essentially zero-cost. That's in contrast to the usual PC setup where the CPU has its own pool of RAM, which is unified with the iGPU if it has one, but the discrete GPU has its own independent pool of VRAM and moving data between the two pools is a relatively slow operation.
An RTX4090 or H100 has memory extremely close to the processor but I don't think you would call it unified memory.
I don't quite understand one of the finer points of this, under caffeinated :) - if GPU memory is extremely close to the CPU memory, what sort of memory would not be extremely close to the CPU?
I think you misunderstood what I meant by "processor", the memory on a discrete GPU is very close to the GPUs processor die, but very far away from the CPU. The GPU may be able to read and write its own memory at 1TB/sec but the CPU trying to read or write that same memory will be limited by the PCIe bus, which is glacially slow by comparison, usually somewhere around 16-32GB/sec.
A huge part of optimizing code for discrete GPUs is making sure that data is streamed into GPU memory before the GPU actually needs it, because pushing or pulling data over PCIe on-demand decimates performance.
> CPU trying to read or write that same memory will be limited by the PCIe bus, which is glacially slow by comparison, usually somewhere around 16-32GB/sec.
If you’re forking out for H100’s you’ll usually be putting them on a bus with much higher throughput, 200GB/s or more.
An RTX4090 or H100 has memory extremely close to the processor but I don't think you would call it unified memory.