I'm designing a new PC and I'd like to be able to run local models. It's not clear to me from posts online what the specs should be. Do I need 128gb of RAM? Or would a 16gb RTX 4060 be better? Or should I get a 4070 ti? If anyone could pint me toward some good guidelines I'd greatly appreciate it.
The 4070ti 16gb would be better as it has a 256bit bus compared to the 4060ti 16gb which, as has been mentioned, only has a 128bit bus. The 4070ti also has more Tensor cores than the 4060ti.
Get as much VRAM as you can afford.
nVIDIA is also releasing new cards starting in late January 2025. The RTX 50 series.
The answer is that it depends which models you want to run. I'd get as much VRAM on your GPU as possible. Once that runs out, it'll start using your system RAM.
You can run local models on a 10 year old laptop. As always the answer is "it depends".
The things you need: memory bandwidth, memory capacity, compute. The more of each the better. The 4060 generally has very poor bandwidth (worse than the 3060) due to its limited bus, but being able to offload more is still generally better.
32GB systems can load 8B models at fp16, 12B at 8 bits, 30B at 4 bits, 70B at 2 bits (roughly speaking). 64GB would be a good minimum if you want to use 70B at 4 bits. Without significant offloading it will be very slow though.
If you want to process long contexts in a decent amount of time it's best to run models with flash attention which requires you to have the KV cache on the GPU. It also lets you use 4 bit cache, which quadruples the amount of context you can fit.