More

ekelsen · 2024-10-19T18:46:06 1729363566

Probably not -- very few whale skeletons on display are from recently deceased whales. Just a random lookup -- the one in the London Natural History Museum is from 1891. Seems likely that when it was new it also leaked some oil?

ekelsen · 2024-08-30T01:18:25 1724980705

Yeah. Did they test the bottle they found? Were there others? Seems crazy to not even charge and have a trial.

ekelsen · 2024-07-16T03:21:21 1721100081

A major component of many CUDA programs these days involves NCCL and high bandwidth intra-node communication.

Does NCCL just work? If not, what would be involved in getting it to work?

ekelsen · 2024-07-15T19:44:40 1721072680

If they had to reverse engineer any compiled code to do this, I think that would be against licenses they had to agree to?

At least grounds for suing and starting an extensive discovery process and possibly a costly injunction...

msond · 2024-07-15T20:07:36 1721074056

We have not reverse engineered any compiled code in the process of developing SCALE.

It was clean-room implemented purely from the API surface and by trial-and-error with open CUDA code.

RockRobotRock · 2024-07-15T20:19:23 1721074763

Isn't that exactly what a "clean room" approach avoids?

ekelsen · 2024-07-15T21:08:17 1721077697

oh definitely. But if I was NVIDIA I'd want to verify that in court after discovery rather than relying on their claim on a website.

RockRobotRock · 2024-07-15T23:14:44 1721085284

good point

ekelsen · 2024-07-16T03:15:28 1721099728

FWIW, I think this is really great work and I wish only the best for scale. Super impressed.

ekelsen · 2024-06-28T14:49:54 1719586194

That's one take...

ekelsen · 2024-05-20T17:43:44 1716227024

Eggplant has a lot of nicotine in it (relative to most plants not a cigarette). Maybe that has something to do with it?

zikduruqe · 2024-05-20T18:18:58 1716229138

I have no clue. I read it in an old book that I read maybe 40 years ago. And for the life of me, can't remember what book it was... it wasn't the Foxfire series, but it was something old Appalachian home remedy related.

Not a lot of references that I have found.

https://www.peoplespharmacy.com/articles/eggplant-banished-w...

ekelsen · 2024-04-08T03:09:14 1712545754

The halving could come from an intended use in a Newton Raphson iteration of a square root refinement.

See for example https://math.mit.edu/~stevenj/18.335/newton-sqrt.pdf

The initial guess is the approximate square root, but it needs to be halved as part of the calculation.

ekelsen · 2024-03-11T15:32:11 1710171131

Some analysis of how and/or why it is able to be 3x faster despite no hardware metric being 3x better would make this actually useful and insightful instead of advertising.

ekelsen · 2024-02-20T03:53:44 1708401224

I wrote an article about these affecting LLM training at https://www.adept.ai/blog/sherlock-sdc

walterbell · 2024-02-20T05:55:02 1708408502

Thanks, does your blog have a working RSS feed?

ekelsen · on Dec 19, 2023

AMD attempted responses go all the way back to 2007 when CUDA first debuted with "Close to Metal" (https://en.wikipedia.org/wiki/Close_to_Metal). They've had nearly 20 years to fix the situation and have failed to do so. Maybe some third party player like Lamini AI will do what they couldn't and get acquired for it.

shihab · on Dec 20, 2023

The thing about modern AI, it's that operations involved here (e.g. Dense matmuls) are lot simpler and GPU friendly than what you'd find in a typical HPC applications. This means you can get pretty close to peak hardware performance using high-level languages like Python or OpenAI's Triton. I think it's unlikely that the push to improve ROCm's standard libraries will come from an AI-focused startup

cavisne · on Dec 20, 2023

Even with AMD working on hard on specific benchmarks for marketing purposes they could not get close to peak hardware performance on their brand new chips.

https://www.semianalysis.com/p/amd-mi300-performance-faster-...