>Hundreds of proposed amendments to the United States Constitution are introduced during each session of the United States Congress. From 1789 through January 3, 2019, approximately 11,770 measures have been proposed to amend the United States Constitution.
> Collectively, members of the House and Senate typically propose around 200 amendments during each two-year term of Congress.
For amendment to pass, it requires 2/3 supermajority in both the Senate and the House of Representatives.
... that and it needs to be ratified by 3/4 of the state legislatures!
See the story of the https://en.wikipedia.org/wiki/Equal_Rights_Amendment which led a lot of people to think the nation is so polarized that we could never get another constitutional amendments. (Some parents of my friends were hoping in the 1980s to get one to ban abortion and... yeah right!)
HuggingFace is currently replicating DeepSeek, and others will follow. That will remove all CCP censorship in the model. DeepSeek has huge impact because it takes only $6 million to train from scratch.
Valuations of private unicorns like OpenAi and Anthropic must be in free fall. DeepSeek spends $6 million in old H800 hardware to develop open source model that overtakes ChatGPT.
AI gets better, but profit margins sink with strong competition.
> DeepSeek spends $6 million in old H800 hardware to develop open source model that overtakes ChatGPT.
DeepSeek claims that's what they spent. They're under a trade embargo, and if they had access to any more than that it would have been obtained illegally.
They might be telling the truth, but let's wait until someone else replicates it before we fully accept it.
I remember a year ago I was hoping that in a decade from now it would be great to run GPT4-class models on my own hardware. The reality seems to be far more exciting.
All of the western AI companies trained on illegally obtained data, they barely even bother to deny it. This is an industry where lies are normalised. (Not to contradict your point about this specific number)
It's legally a grey area. It might even be fair use. Facts themselves are not protected by copyright. If there's no unauthorized reproduction/copying then it's not a copyright issue. (Maybe it's a violation of terms of services of course.)
We don't know what LLMs encode because we don't know what the model weights represent.
On the second point it depends how the models were made to reporduce text verbatim. If i copy-paste someone's article in MS word i technically made word reproduce the text verbatim., obviously that's not Word's fault. If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.
> If i copy-paste someone's article in MS word i technically made word reproduce the text verbatim., obviously that's not Word's fault. If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.
But that clearly means that the LLM already has the Bee Movie script inside it (somehow), which would be a copyright violation. If MS word came with an "open movie script" button that let you pick a movie and get the script for it, that would clearly be a copyright violation. Of course if the user inputs something then that's different - that's not the software shipping whatever it is.
> If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.
Huh? The "request" part doesn't matter. What you describe is exactly like if someone ships me a hard drive with a file containing "the entire Bee Movie script" that they were not authorized to copy: it's copyright infringement before and after I request the disk to read out the blocks with the file.
I mean, it is IP law, this stuff was all invented to help big corps support their business models. So, it is impossible to predict what any of it means until we see who is willing to pay more to get their desired laws enforced. We’ll have to wait for more precedent to be purchased before us little people can figure out what the laws are.
Copies are made in the formation of the training corpus and in the memory of the computers during training so there's definitely a copyright issue. Could be fair use though.
No, the DMCA amended the law to give search engines (and automated caches and user generated content sites) safe harbor from infringement if they follow the takedown protocol.
PRC companies breaking US export control laws is legal (for PRC companies). Maybe they're trying to avoid US entity listing, lot's of PRC companies keep mum about growing capabilites to do so. But the mere fact Deepseek is publicizing means they're unlikely to care about the political heat that is coming and the ramifications. If anything, getting on US entity list probably locks in their employees with Deepseek on resume into PRC.
Depending on how the law is written this may be legal even under US law.
For instance if the law bans US companies from exporting/selling some chips to Chinese companies and that's it then it is unclear to me whether a Chinese company would do anything illegal under US law by buying such chips as it would be for the American seller to refuse.
Anyway, usually this sort of things takes place through intermediaries in third countries so it is difficult to track but obviously it would be stupid to brag about it if that happened.
Which allies? The ones the current US president is threatening in all sorts of manner?
I actually hope he doubles down. I would love for EU to rely less on the US. It would also reduce the reach of the silly embargoes that benefit no one but the US.
Hard to think they plan to, PRC strategic companies that gets competitive gets entity listed anyway. And CEO seems mission driven for AGI - if US going to limit hardware inevitably then nothing to do but go gloves off, and try to dunk on competition. At this point US can take deep seek off appstores but what's the point except to look petty. Eitherway, more technical ppl have pointed out some of the R1 optimizations _only_ make sense if Deepseek was constrained to older hardware, i.e. engineer at PTX level to circumvent H800 limitations to perfrom more like H100s.
Throwing this model out also gives US allies soverign AI a launchpad... reducing US dependency is step 1 to not being US allies.
If they sell software and build devices in China and then people from the US or our allies have to break our laws to import it, it seems like an us problem.
That's 8 (not 4), on a NVIDIA platform board to start with.
You can't buy them as "GPU"s and integrate them to your system. NVIDIA sells your the platform (GPUs + platform board which includes switches and all the support infra), and you integrate that behemoth of a board to your server, as a single unit.
So that open server and the wrapped ones at the back are more telling than it looks.
I believe that NVIDIA is overvalued, but if DeepSeek really is as great as has been said, then it'll be even greater when scaled up to OpenAI sizes, and when you get more out you have more reason to pay, so this should if it pans out lead to more demand for GPUs-- basically Jevon's paradox.
If the top-tier premium GPUs aren't the difference-maker they were thought to be then that will hurt NVIDIA's margins, even if they make some of it up on volume.
It is a possibility, but my understanding of what OpenAI has said is that GPT-5 is delayed because of the apparent promise of RL trained things like o1, etc. and that they've simply decided to train those instead of training a bigger base model training on better data, and I think this is plausible.
If we expect that the demand for GPT-5 in AI compute is 100x of that of GPT-4 then if GPT-4 was trained in months on 10k of H100 then you would need years with 100k of H100 or maybe again months with 100k of GB200.
See, there is your answer. The issue is the compute of GPUs is way to low yet for GPT-5 if they continue parameter scaling as they used to do.
GPT3 took months on 10k A100s. 10k H100 would have done it in a fraction of a time. Blackwell could train GPT4 in 10 days with same amount of GPUs as Hopper which took months.
Don't forget GPT3 is just 2.5 years old. Training is obviously waiting for the next step up in large clusters of training speed increasement. Don't be fooled, the 2x Blackwell vs. Hopper is only chip vs. chip. 10k of Blackwell including all networking speedup is easily 10x or more faster than the same amount of Hopper. So building a 1 million Blackwell cluster means 100x more training compute compared to a 100k Hopper cluster.
Nobody starts a model training if it takes years to finish... too much risk in that.
Transfer model was introduced in 2017 and ChatGPT came out 2022. Why? Because they would have needed millions of Volta GPUs instead of thousands of Ampere GPUs to train it.
But surely it can be scaled up, or is this compression thing something making the approach good only for small models (I haven't read the Deepseek papers (can't allocate time to it))?
Have you read about this specific model we're talking about?
My understanding is that the whole point of R1 is that it was surprisingly effective to train on synthetic data AND to reinforce on the output rather than the whole chain of thought. Which does not require so much human-curated data and is a big part of where the efficiency gain came from.
If, as some companies claim, these models truly possess emergent reasoning, their ability to handle imperfect data should serve as a proof of that capability.
For Oracle (another Stargate recipient) it was reversion to the mean. For Nvidia, it's a big loss - I imagine they might have predicated their revenue based on the continued need for compute - and now that's in question.
This is not exactly right, they said they spent $6M on training V3, there aren't numbers out there related to the training of R1, I can feel it will be cheaper than o1, but it's hard to tell how much cheaper. I can guess that overall deepseek spent way less than openai to release the model, because I have the feeling that the R&D part was cheaper too, but we don't have the numbers yet. Anyway, we can assume that deepseek and Alibaba will try to get the most out of their current GPUs however.
The bigger correction will be in tech stocks that are overly exposed to datacenter investments to accommodate for ever rising AI demands. MSFT, AMZN, META they are all exposed
It's kind of silly. It's not like MSFT and the other hyper-scalers dont need the capacity build out for other reasons too. This should be an easy pivot if DeepSeek turns out to be as good as promised.
Of course they are overhyped but in spite of this Altman is always asking for more money. And we know financially they are just burning money. So when someone finally brings a cheap but good model for the masses, this is where money should go. (This will also help all small AI startups.)
Consider that the chinese might be misrepresenting their costs. A newsletter was implying that they might do it to undermine the sanctions justifications.
Agree that the AI bubble should pop though and the earlier, the better.
They express their cost in terms of GPU hours, then convert that to USD based on market GPU rental rates, so it's not affected by subsidies. It's possible however they lied about GPU hours, but if that was the case an expert should be able to show they lied by working out how many flops are needed to train based on the amount of tokens they say they used vs the flops of the GPUs they say they used.
Total training FLOPs can be deduced from model architecture (which they can't hide since they released weights) and how many tokens they trained on. With total training FLOPs and GPU hours you can calculate MFU. And the MFU of their deepseek-v3 train is around 40%, which sounds right. Both Google and Meta reported higher MFU. So the GPU hours should be correct. The only thing they could have lied is on how many tokens they trained the model on. DeepSeek reported 14T which is also similar to what Meta did so nothing crazy here.
tl;dr all numbers check up and the winnings come from the model architecture innovations they made.
I did a quick search for "llama" and didn't find anywhere they outright state they just fine-tuned some llama weights.
Is it possible that they based their model architecture on the llama model architecture? Rather than just fine-tuned already training llama weights? In that case, they'd still have to do "bottoms up" training.
Much easier to identify the incentives of the people who just lost a lot of money who were betting on the idea that it was their money that was going to make artificial intelligence intelligent.
Everyone’s already begun trying this recipe in-house. Either it works with much less compute, or it doesn’t.
For instance, HKUST just did an experiment where small weak base models trained with DeepSeek’s method beat stronger small base models being trained with much more costly RL methods. Already this seems like it is enough to upend the low end models niche market, things like haiku and 4o-mini.
Be really skeptical why the people who should be making tons of money by realizing actually it was all a mirage and that they can now get the real stuff for even cheaper, would spend so much effort shouting about this, in order to undercut their own profitability..
That's arguable, though. I mean it's much cheaper and reasonably competitive which is almost the same but IMHO DeepSeek seems to get stuck in random loops and hallucinates more frequently than o1.
The issue here is not that DeepSeek exists as a competitor to GPT, Claude, Gemini,...
The issue is that DeepSeek have shown that you don't need that much raw computing power to run an AI, which means that companies including OpenAI may focus more on efficiency than on throwing more GPUs at the problem, which is not good news for those in the business of making GPUs. At least according to the market.
One of the questions about this is that of the US’s human capital, i.e. does the US (still) have enough capable tech people in order to make that happen?
Lol, yes. The US is still very much at the forefront of this stuff. DeepSeek have presented some neat optimizations, but there have been many such papers and optimizations get implemented quickly once someone has proven them out.
> The US is still very much at the forefront of this stuff
Doesn't look like it, because the some of the biggest US tech companies now active (including Meta and Alphabet) couldn't come up with what this much-smaller Chinese company has. Which begs the question, what is that companies like Meta, Alphabet and the like do with the (already) hundreds of billions of dollars that they invested in this space?
Best guess is that they were all caught up in the arms race to try and make a better model, at whatever the cost. And if you work in this space you were probably getting thrown fistfuls of money to join in on it. I read somewhere on reddit that anyone trying to push for efficiency at these places was getting ignored or pushed aside. DeepSeek had an incentive to focus on efficiency because of the chip embargo. So I don't think this is necessarily a knock on US AI capabilities. It is just that the incentives were different and when stock prices are going to the moon regardless of how much capex was getting spent, it was easy for everyone to just go along with it.
With that said, I think all of these companies are capable of learning from this and implementing these efficiency improvement. And I think the arms race is still on. The goal is to achieve super human level of intelligence, and they have a ways to go to get there. It is possible that these new efficiency improvements might even help them take the next step as they can now do a lot more with a lot less.
I see no reason to believe they couldn't have done so. Rather, this is the typical pattern we see across industry: the west focuses on working out what the next big thing is, and China is in a fast-follow-and-optimize mode.
> You can ban the company but are you going to ban any US company from using the open model and running it on their own hardware [1]?
Just for the people who might not have been around the last time, this has precedent :) US government (and others) have been trying to outlaw (open source) cryptography, for various reasons, for decades at this point: https://en.wikipedia.org/wiki/Crypto_Wars
The vast majority of what the US government has tried to ban was export of cryptography tools. However, as your own link makes clear, they stopped doing that in 2000.
Furthermore, what was restricted was not "open source cryptography"; it was cryptography that they could not break. The only way that open source comes into it is that that is what made it abundantly clear that the cat was out of the bag and there was no going back.
Please try to at least attempt to consider nuance. Do you seriously think that would happen? What is your point here? Do you think people in favor of restricting one thing are in favor of restricting everything?
People are trying to spur up “we shouldn’t use Chinese AI because our data is going to be stolen” discussions. But after TikTok debacle, no serious person is willing to bite. It’s just a big coping strategy for everyone who’s been saying how western AI is years ahead.
> Please try to at least attempt to consider nuance. Do you seriously think that would happen? What is your point here? Do you think people in favor of restricting one thing are in favor of restricting everything?
The restriction on TikTok was blatantly because it's a Chinese product outcompeting American products, everything else was the thinnest of smokescreens. Yes, I think people in favour of it are in favour of slapping whatever tariffs or bans they can get away with on everything that China makes.
Journos rarely checks out these UFO grifters, but when somebody does it all falls down. Take for example Luis Elizondo. He never worked for AATIP as he claims. Another UFO nut Senator Harry Reid just got him permission to hang around. The only thing that seems relatively sure was that Luis Elizondo was in ROC, maybe never passed.
You equating completely different things. Executive order is how president works.
The German ambassador talks about using government power to go against your domestic political enemies. Or using using lawsuits, threatening criminal prosecution and license revocation to prevent media speaking negatively about Trump.
Correct me if I'm wrong, but I don't see this being viable even if you reach your target efficiency.
The problem with hydrogen is the storage cost. Improving wire to to wire efficiency can help only so much. Have you calculated the electricity cost with those efficiency rates when you include the cost of storage? "Overall cost of renewable hydrogen in 2030 varies from €2.80–15.65/kgH2." improves with scale. https://www.sciencedirect.com/science/article/pii/S036031992...
Quick and dirty math, may contain errors:
Lightcell target is 0.5 kWh/L. Hydrogen weighs 0.09kg/L.
-> storage cost alone: ~ €0.5/kWh in large scale, €2.5/kWh in small scale.
Average electricity cost in the EU has been €0.289 per kWh.
> Average electricity cost in the EU has been €0.289 per kWh.
I'm curious where you're getting this from, and also what other Europeans on HN currently pay?
I'm in Spain with Octopus (via Spock's collective bargaining), and my effective price for December ended up being 0.131 EUR/kWh, while you claim a price that is 3x what I currently pay. Just wondering if I'm an outlier with the price Spock managed to get us.
Edit:
> The EU average price in the first half of 2024 — a weighted average using the most recent (2022) consumption data for electricity by household consumers — was €0.2889 per KWh.
Guessing that's your source :) Seems that's specific for home usage though, while your comment seems to be in a different context. Not sure electricity is cheaper/more expensive in industrial contexts.
I’m with Octopus in the UK (so not EU any more), on the Agile plan so it changes depending on wholesale prices. My average last month was £0.2061/kWh. Fixed tariffs are closer to £0.25/kWh.
That's the levelized cost over the lifetime. Hydrogen storage is expensive to both build and maintain.
The issues include hydrogen embrittlement, constant leakage and safety issues. Containers don't last. H2 is the smallest molecule. It gets into the containers and wears them out and leaks away. Casing and seal damage is constant. Pressure vessel storage loses little below 1% leakage per day.Liquid hydrogen storage is about 1-3% leakage per day. Salt cavern storage much less but they have problem of H2S generation
by Micro-organisms.
I don't see how you can compute that cost if you don't know anything about the amount of energy that goes into and out of the container, and how often that happens.
Jan 19. is Sunday. Trump becomes president Jan 20. Biden understands that it's Trump's problem now. Filing stuff in the morning just to complicate any decisions Trump may take is just silly.
https://en.wikipedia.org/wiki/List_of_proposed_amendments_to...
>Hundreds of proposed amendments to the United States Constitution are introduced during each session of the United States Congress. From 1789 through January 3, 2019, approximately 11,770 measures have been proposed to amend the United States Constitution.
> Collectively, members of the House and Senate typically propose around 200 amendments during each two-year term of Congress.
For amendment to pass, it requires 2/3 supermajority in both the Senate and the House of Representatives.