More

wuschel · 2025-02-04T11:14:58 1738667698

Similar activity was observed in Poland for a long time. The country was split half-half between the ruling party that strived to deconstruct the democractic institutions and a weak, fragmented left that was unable to win enough votes to re-gain power.Loyalists were put in power, including the president of the state.

Poland is still repairing the damages done to the court system, media landscape, and other state organisations and organs.

The learning, whether Weimar Republic, Hungary, Poland et cetera is: A change in power is not a problem unless the party in power is out to destroy all that enables other players to keep power in check and allow for a future peaceful political transition.

The U.S. has a different political culture and norms than E.U. states. Will will see how things will turn out.

wuschel · 2024-11-26T11:51:17 1732621877

> I'm in a custody battle to allow shared custody of my daughter.

Been there. I wish you all the strength to make the right choices for your daughters sake <3

wuschel · 2024-11-23T11:31:11 1732361471

Is this a peer reviewed paper? It does not seem to be. At a first glance, the researchgate URI and the way the title was formulated made me think it would be the case.

wuschel · 2024-10-10T15:17:16 1728573436

There is more to the story than I can tell here, unfortunately, but at least I can write this:

During my work for Bentley Motors I was at one of the Geneve Motor Shows in the 2010s. During my stay at the fair, a new (1500 USD) Tata car was introduced. I visited the Tata stand with a friend, looking at their new car, which was quite the contrast in product philosophy, design and target group to the Bentley models. Thanking our host at the Tata stand for our personal tour, I gave him the invitation to return him the favour and show him the Bentley models and stand.

To my great surprise later that day Ratan Tata came to the Bentley stand with what appeared to be his family (some male and female family members) - and I was able to show him around. We could not talk much due to the bodyguards and press, but he seemed distinct in demeanor to his sons and entourage. Apart from the colourful and diverse customer group, I met Piech, other industrial magnates of our time, but Ratan managed to retain a humble and human aura. I sympathized with him.

bongoman42 · 2024-10-10T16:31:28 1728577888

Yes, I had the chance to meet him once because he invested in our company and paid a visit. He walked around the office and met folks, was warm and approachable even more so than our own executive team. Massively underrated guy.

linksnapzz · 2024-10-10T16:04:42 1728576282

Definitely a contrast with Piech; who had once been described as "just a monocle and persian cat away from being the villain in a James Bond movie".

ryzvonusef · 2024-10-10T19:43:28 1728589408

Ratan Tata never married or have kids, so those weren't his sons, maybe other relatives or colleagues/employees.

wuschel · 2024-10-10T22:41:19 1728600079

You are definitely on point. I naturally made this assumption and thought that to be true until when I wrote my comment today.

wuschel · 2024-07-23T15:39:21 1721749161

OK, I am curious now: What kind of hardware would I need to run such a model for a couple of users with decent performance?

Where could I get a mapping of token / time vs hardware?

danieldk · 2024-07-23T18:57:40 1721761060

You can run the 4-bit GPTQ/AWQ quantized Llama 405B somewhat reasonably on 4x H100 or A100. You will be somewhat limited in how many tokens you can have in flight between requests and you cannot create CUDA graphs for larger batch sizes. You can run 405B well on 8x H100 and A100, either with the mixed BFloat16/FP8 checkpoint that Meta provided or GPTQ/AWQ-quantized models. Note though that the A100 does not have native support for FP8, but FP8 quantized weights can be used through the GPTQ-Marlin FP8 kernel.

Here are some TGI 405B benchmarks that I did with the different quantized models:

https://x.com/danieldekok/status/1815814357298577718

The 405B model is very useful outside direct use in inference though. E.g. for generating synthetic data for training smaller model:

https://huggingface.co/blog/synthetic-data-save-costs

risho · 2024-07-23T21:06:19 1721768779

how much vram do you need for 4-bit llama 405?

zargon · 2024-07-24T01:22:35 1721784155

405 billion * 4 bits = approximately 200 GB. Plus extra for the amount of context you want.

angoragoats · 2024-07-23T16:45:24 1721753124

Unsure if anyone has specific hardware benchmarks for the 405b model yet, since it's so new, but elsewhere in this thread I outlined a build that'd probably be capable of running a quantized version of Llama 3.1 405b for roughly $10k.

The $10k figure is likely roughly the minimum amount of money/hardware that you'd need to run the model at acceptable speeds, as anything less requires you to compromise heavily on GPU cores (e.g. Tesla P40s also have 24GB of VRAM, for half the price or less, but are much slower than 3090s), or run on the CPU entirely, which I don't think will be viable for this model even with gobs of RAM and CPU cores, just due to its sheer size.

bick_nyers · 2024-07-23T18:51:39 1721760699

Energy costs are an important factor here too. While Quadro cards are much more expensive upfront (higher $/VRAM), they are cheaper over time (lower Watts/Token). Offsetting the energy expense of a 3090/4090/5090 build via solar complicates this calculation but generally speaking can be a "reasonable" way of justifying this much hardware running in a homelab.

I would be curious to see relative failure rates over time of consumer vs Quadro cards as well.

angoragoats · 2024-07-23T19:42:29 1721763749

Agree 100% that energy costs are important. The example system in my other post would consume somewhere around 300W at idle, 24/7, which is 219 kWh per month, and that's assuming you aren't using the machine at all.

I don't have any actual figures to back this up, but my gut tells me that the fact that enterprise GPUs are an order of magnitude (at least) more expensive than, say a, 3090, means that the payback period of them has got to be pretty long. I also wonder whether setting the max power on a 3090 to a lower than default value (as I suggest in my other post) has a significant effect on the average W/token.

bick_nyers · 2024-07-24T16:01:15 1721836875

Agreed, but there are other costs associated with supporting 10-16x GPUs that may not necessarily happen with say 6 GPUs. Having to go from single socket (or Threadripper) to dual socket, PCIE bifurcation, PLX risers, etc.

Not necessarily saying that Quadros are cheaper, just that there's more to the calculation when trying to run 405B size models at home

angoragoats · 2024-07-24T19:41:54 1721850114

The system I outlined in my other post [0] has ten GPUs and does not require dual socket CPUs as far as I'm aware. It could likely scale easily to 14 GPUs as well (assuming you have sufficient power), with an x8/x8 bifurcation adapter installed in each PCIe slot. This is pushing the limits of the PCIe subsystem I'm sure, but you could also likely scale up to 28 GPUs, again assuming sufficient power, by simply bifurcating at x4/x4/x4/x4 vs x8/x8.

I think it should work as-is with the components listed, but if you disagree please let me know!

[0] https://news.ycombinator.com/item?id=41047689

lostmsu · 2024-07-23T23:17:35 1721776655

I don't think this is correct. 5 years power usage of 4090 is $2600 giving TCO of ~$4300. RTX 6000 Ada starts at $6k for the card itself.

https://gpuprices.us

bick_nyers · 2024-07-24T17:00:34 1721840434

To be fair, you need 2x 4090 to match the VRAM capacity of an RTX 6000 Ada. There is also the rest of the system you need to factor into the cost. When running 10-16x 4090s, you may also need to upgrade your electrical wiring to support that load, you may need to spend more on air conditioning, etc.

I'm not necessarily saying that it's obviously better in terms of total cost, just that there are more factors to consider in a system of this size.

If inference is the only thing that is important to someone building this system, then used 3090s in x8 or even x4 bifurcation is probably the way to go. Things become more complicated if you want to add the ability to train/do other ML stuff, as you will really want to try to hit PCIE 4.0 x16 on every single card.

lostmsu · 2024-07-24T17:11:33 1721841093

With 2x 4090 you will have 2x speed of RTX 6000 A. So same energy per token.

Will need more space, true.

angoragoats · 2024-07-24T19:42:48 1721850168

Yeah, after digging more into RTX 6000 Ada cards, I don't see any way they'd be more economical even over many years, no matter how you slice it.

wuschel · 2024-07-08T07:37:41 1720424261

Well, yes. Then again, it can also be the most rewarding, purposeful thing in the life of a female (and male) human, an experience of pain but sheer beauty.

wuschel · 2024-07-01T11:03:38 1719831818

Seconded! Any URI to your PhD?

rovr138 · 2024-07-01T12:16:25 1719836185

Check https://www.researchgate.net/publication/356873749_Extractin...

wuschel · 2024-06-24T16:54:43 1719248083

Interesting - I guess there is some sort of balance there, given the radical biochemistry that is seen with Mitochondria. “The more [mitochondria] the merrier” …?

mrcartmeneses · 2024-06-24T20:24:56 1719260696

The more the merrier, yes. I’m an amateur road cyclist and most of my training is spent in the lower heart-rate “zones” trying to train my mitochondria. The theory is that for endurance sports the key variable is your mitochondria’s capacity to use oxygen and fuel to produce ATP.

Further, if the mitochondria is being asked to make more ATP than it can aerobically, then it will skip the final respiratory step and respire without oxygen (anaerobically). This causes a build up of lactate in the cells that is not tolerated above a certain level, I believe due to it raising acidity levels in the cell.

You’ll often hear athletes and coaches talk about lactate threshold and Functional Threshold Power (FTP). This is all to do with mitochondria function.

freilanzer · 2024-06-25T16:01:19 1719331279

If I'm rowing for 20-30 minutes in a HF range of 150-160, that should fall into your parameters, right? This is a very interesting fact - I have been sedentary for a couple of years and I'm fighting a kind of fatigue. Maybe this is a way to work against the symptoms. Do you know of a way to tell if the effects are taking hold?

mrcartmeneses · 2024-06-29T17:42:10 1719682930

You’ll know! I’ve been doing a lot of “base” training over the last few months and I feel a lot faster. And my Strava data agrees

wuschel · 2024-06-24T10:06:48 1719223608

PM me if you will. Been around the ground zero of lab grown meat in the mid in the last decade, and I am curious what is/was happening there.

wuschel · 2024-05-29T08:16:39 1716970599

> mental model mapping IRL human mechanism design into a high-fidelity digital analogy

Could you please elaborate what you mean by that?

benreesman · 2024-05-29T09:00:22 1716973222

Mechanism design is broadly the study (with a practical as opposed to theoretical emphasis) of the way that incentives shape human behavior.

For better or worse Mark was/is able to see some deep minimal structure that allows what used to be a web page and is now a mobile app to elicit responses that bear an uncanny resemblance to the way human beings behave and interact in a setting unmediated by either a priest or a protocol. On the properties he runs people act a hell of a lot like they do in a bar or any other place where sapiens mix and match.

I’m not sure that turbocharging spinal-reflex humanity via computer networks is going all that well, which is one of the main reasons I parted ways with the endeavor once the true scope for mechanical advantage became clear, but he clearly sees things about what motivates people that Freud was throwing darts at.

I might have been one of the few true assassins he sent after people like Vic Gunderotta or Evan Spiegel and certainly he knows how to delegate the mechanics of leaving would-be adversaries on the scrap heap of history, but he knew who to send the hitters after and when.