Hacker News new | past | comments | ask | show | jobs | submit | ekelsen's comments login

Probably not -- very few whale skeletons on display are from recently deceased whales. Just a random lookup -- the one in the London Natural History Museum is from 1891. Seems likely that when it was new it also leaked some oil?


Yeah. Did they test the bottle they found? Were there others? Seems crazy to not even charge and have a trial.


A major component of many CUDA programs these days involves NCCL and high bandwidth intra-node communication.

Does NCCL just work? If not, what would be involved in getting it to work?


If they had to reverse engineer any compiled code to do this, I think that would be against licenses they had to agree to?

At least grounds for suing and starting an extensive discovery process and possibly a costly injunction...


We have not reverse engineered any compiled code in the process of developing SCALE.

It was clean-room implemented purely from the API surface and by trial-and-error with open CUDA code.


Isn't that exactly what a "clean room" approach avoids?


oh definitely. But if I was NVIDIA I'd want to verify that in court after discovery rather than relying on their claim on a website.


good point


FWIW, I think this is really great work and I wish only the best for scale. Super impressed.


That's one take...


Eggplant has a lot of nicotine in it (relative to most plants not a cigarette). Maybe that has something to do with it?


I have no clue. I read it in an old book that I read maybe 40 years ago. And for the life of me, can't remember what book it was... it wasn't the Foxfire series, but it was something old Appalachian home remedy related.

Not a lot of references that I have found.

https://www.peoplespharmacy.com/articles/eggplant-banished-w...


The halving could come from an intended use in a Newton Raphson iteration of a square root refinement.

See for example https://math.mit.edu/~stevenj/18.335/newton-sqrt.pdf

The initial guess is the approximate square root, but it needs to be halved as part of the calculation.


Some analysis of how and/or why it is able to be 3x faster despite no hardware metric being 3x better would make this actually useful and insightful instead of advertising.


I wrote an article about these affecting LLM training at https://www.adept.ai/blog/sherlock-sdc


Thanks, does your blog have a working RSS feed?


AMD attempted responses go all the way back to 2007 when CUDA first debuted with "Close to Metal" (https://en.wikipedia.org/wiki/Close_to_Metal). They've had nearly 20 years to fix the situation and have failed to do so. Maybe some third party player like Lamini AI will do what they couldn't and get acquired for it.


The thing about modern AI, it's that operations involved here (e.g. Dense matmuls) are lot simpler and GPU friendly than what you'd find in a typical HPC applications. This means you can get pretty close to peak hardware performance using high-level languages like Python or OpenAI's Triton. I think it's unlikely that the push to improve ROCm's standard libraries will come from an AI-focused startup


Even with AMD working on hard on specific benchmarks for marketing purposes they could not get close to peak hardware performance on their brand new chips.

https://www.semianalysis.com/p/amd-mi300-performance-faster-...


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: