Similar activity was observed in Poland for a long time. The country was split half-half between the ruling party that strived to deconstruct the democractic institutions and a weak, fragmented left that was unable to win enough votes to re-gain power.Loyalists were put in power, including the president of the state.
Poland is still repairing the damages done to the court system, media landscape, and other state organisations and organs.
The learning, whether Weimar Republic, Hungary, Poland et cetera is: A change in power is not a problem unless the party in power is out to destroy all that enables other players to keep power in check and allow for a future peaceful political transition.
The U.S. has a different political culture and norms than E.U. states. Will will see how things will turn out.
Is this a peer reviewed paper? It does not seem to be. At a first glance, the researchgate URI and the way the title was formulated made me think it would be the case.
There is more to the story than I can tell here, unfortunately, but at least I can write this:
During my work for Bentley Motors I was at one of the Geneve Motor Shows in the 2010s. During my stay at the fair, a new (1500 USD) Tata car was introduced. I visited the Tata stand with a friend, looking at their new car, which was quite the contrast in product philosophy, design and target group to the Bentley models. Thanking our host at the Tata stand for our personal tour, I gave him the invitation to return him the favour and show him the Bentley models and stand.
To my great surprise later that day Ratan Tata came to the Bentley stand with what appeared to be his family (some male and female family members) - and I was able to show him around. We could not talk much due to the bodyguards and press, but he seemed distinct in demeanor to his sons and entourage. Apart from the colourful and diverse customer group, I met Piech, other industrial magnates of our time, but Ratan managed to retain a humble and human aura. I sympathized with him.
Yes, I had the chance to meet him once because he invested in our company and paid a visit. He walked around the office and met folks, was warm and approachable even more so than our own executive team. Massively underrated guy.
You can run the 4-bit GPTQ/AWQ quantized Llama 405B somewhat reasonably on 4x H100 or A100. You will be somewhat limited in how many tokens you can have in flight between requests and you cannot create CUDA graphs for larger batch sizes. You can run 405B well on 8x H100 and A100, either with the mixed BFloat16/FP8 checkpoint that Meta provided or GPTQ/AWQ-quantized models. Note though that the A100 does not have native support for FP8, but FP8 quantized weights can be used through the GPTQ-Marlin FP8 kernel.
Here are some TGI 405B benchmarks that I did with the different quantized models:
Unsure if anyone has specific hardware benchmarks for the 405b model yet, since it's so new, but elsewhere in this thread I outlined a build that'd probably be capable of running a quantized version of Llama 3.1 405b for roughly $10k.
The $10k figure is likely roughly the minimum amount of money/hardware that you'd need to run the model at acceptable speeds, as anything less requires you to compromise heavily on GPU cores (e.g. Tesla P40s also have 24GB of VRAM, for half the price or less, but are much slower than 3090s), or run on the CPU entirely, which I don't think will be viable for this model even with gobs of RAM and CPU cores, just due to its sheer size.
Energy costs are an important factor here too. While Quadro cards are much more expensive upfront (higher $/VRAM), they are cheaper over time (lower Watts/Token). Offsetting the energy expense of a 3090/4090/5090 build via solar complicates this calculation but generally speaking can be a "reasonable" way of justifying this much hardware running in a homelab.
I would be curious to see relative failure rates over time of consumer vs Quadro cards as well.
Agree 100% that energy costs are important. The example system in my other post would consume somewhere around 300W at idle, 24/7, which is 219 kWh per month, and that's assuming you aren't using the machine at all.
I don't have any actual figures to back this up, but my gut tells me that the fact that enterprise GPUs are an order of magnitude (at least) more expensive than, say a, 3090, means that the payback period of them has got to be pretty long. I also wonder whether setting the max power on a 3090 to a lower than default value (as I suggest in my other post) has a significant effect on the average W/token.
Agreed, but there are other costs associated with supporting 10-16x GPUs that may not necessarily happen with say 6 GPUs. Having to go from single socket (or Threadripper) to dual socket, PCIE bifurcation, PLX risers, etc.
Not necessarily saying that Quadros are cheaper, just that there's more to the calculation when trying to run 405B size models at home
The system I outlined in my other post [0] has ten GPUs and does not require dual socket CPUs as far as I'm aware. It could likely scale easily to 14 GPUs as well (assuming you have sufficient power), with an x8/x8 bifurcation adapter installed in each PCIe slot. This is pushing the limits of the PCIe subsystem I'm sure, but you could also likely scale up to 28 GPUs, again assuming sufficient power, by simply bifurcating at x4/x4/x4/x4 vs x8/x8.
I think it should work as-is with the components listed, but if you disagree please let me know!
To be fair, you need 2x 4090 to match the VRAM capacity of an RTX 6000 Ada. There is also the rest of the system you need to factor into the cost. When running 10-16x 4090s, you may also need to upgrade your electrical wiring to support that load, you may need to spend more on air conditioning, etc.
I'm not necessarily saying that it's obviously better in terms of total cost, just that there are more factors to consider in a system of this size.
If inference is the only thing that is important to someone building this system, then used 3090s in x8 or even x4 bifurcation is probably the way to go. Things become more complicated if you want to add the ability to train/do other ML stuff, as you will really want to try to hit PCIE 4.0 x16 on every single card.
Well, yes. Then again, it can also be the most rewarding, purposeful thing in the life of a female (and male) human, an experience of pain but sheer beauty.
Interesting - I guess there is some sort of balance there, given the radical biochemistry that is seen with Mitochondria. “The more [mitochondria] the merrier” …?
The more the merrier, yes. I’m an amateur road cyclist and most of my training is spent in the lower heart-rate “zones” trying to train my mitochondria. The theory is that for endurance sports the key variable is your mitochondria’s capacity to use oxygen and fuel to produce ATP.
Further, if the mitochondria is being asked to make more ATP than it can aerobically, then it will skip the final respiratory step and respire without oxygen (anaerobically). This causes a build up of lactate in the cells that is not tolerated above a certain level, I believe due to it raising acidity levels in the cell.
You’ll often hear athletes and coaches talk about lactate threshold and Functional Threshold Power (FTP). This is all to do with mitochondria function.
If I'm rowing for 20-30 minutes in a HF range of 150-160, that should fall into your parameters, right? This is a very interesting fact - I have been sedentary for a couple of years and I'm fighting a kind of fatigue. Maybe this is a way to work against the symptoms. Do you know of a way to tell if the effects are taking hold?
Mechanism design is broadly the study (with a practical as opposed to theoretical emphasis) of the way that incentives shape human behavior.
For better or worse Mark was/is able to see some deep minimal structure that allows what used to be a web page and is now a mobile app to elicit responses that bear an uncanny resemblance to the way human beings behave and interact in a setting unmediated by either a priest or a protocol. On the properties he runs people act a hell of a lot like they do in a bar or any other place where sapiens mix and match.
I’m not sure that turbocharging spinal-reflex humanity via computer networks is going all that well, which is one of the main reasons I parted ways with the endeavor once the true scope for mechanical advantage became clear, but he clearly sees things about what motivates people that Freud was throwing darts at.
I might have been one of the few true assassins he sent after people like Vic Gunderotta or Evan Spiegel and certainly he knows how to delegate the mechanics of leaving would-be adversaries on the scrap heap of history, but he knew who to send the hitters after and when.
Poland is still repairing the damages done to the court system, media landscape, and other state organisations and organs.
The learning, whether Weimar Republic, Hungary, Poland et cetera is: A change in power is not a problem unless the party in power is out to destroy all that enables other players to keep power in check and allow for a future peaceful political transition.
The U.S. has a different political culture and norms than E.U. states. Will will see how things will turn out.
reply