Hacker News new | past | comments | ask | show | jobs | submit | nabla9's comments login

This comment provides context.

https://en.wikipedia.org/wiki/List_of_proposed_amendments_to...

>Hundreds of proposed amendments to the United States Constitution are introduced during each session of the United States Congress. From 1789 through January 3, 2019, approximately 11,770 measures have been proposed to amend the United States Constitution.

> Collectively, members of the House and Senate typically propose around 200 amendments during each two-year term of Congress.

For amendment to pass, it requires 2/3 supermajority in both the Senate and the House of Representatives.


... that and it needs to be ratified by 3/4 of the state legislatures!

See the story of the https://en.wikipedia.org/wiki/Equal_Rights_Amendment which led a lot of people to think the nation is so polarized that we could never get another constitutional amendments. (Some parents of my friends were hoping in the 1980s to get one to ban abortion and... yeah right!)

But then there was the amazing story of the 27th https://constitutioncenter.org/the-constitution/amendments/a...


Reuters changed the original title.

You can see the original in the url "/chinese-ai-startup-deepseek-overtakes-chatgpt-apple-app-store-2025-01-27/"

News sites try multiple headlines and settle for the one that gets most clicks.


DeepSeek open sourced the model

HuggingFace is currently replicating DeepSeek, and others will follow. That will remove all CCP censorship in the model. DeepSeek has huge impact because it takes only $6 million to train from scratch.


Yes I'm quite happy they gave so many details in the paper, it's the first sota paper with actual science in it!

This thread is about americans installing the DeepSeek app though, which is quite worrisome.



Nvidia -13% in Frankfurt stock market just now.

Valuations of private unicorns like OpenAi and Anthropic must be in free fall. DeepSeek spends $6 million in old H800 hardware to develop open source model that overtakes ChatGPT. AI gets better, but profit margins sink with strong competition.

Chinese AI startup DeepSeek overtakes ChatGPT on Apple App Store https://news.ycombinator.com/item?id=42839656

Edit: Nvidia now -15% in Frankfurt.


> DeepSeek spends $6 million in old H800 hardware to develop open source model that overtakes ChatGPT.

DeepSeek claims that's what they spent. They're under a trade embargo, and if they had access to any more than that it would have been obtained illegally.

They might be telling the truth, but let's wait until someone else replicates it before we fully accept it.


Huggingface is currently replicating it.

Replications of small models indicate that they don't lie any significant amount. The architecture is cheap to train.

Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Model RL Revolution https://xyzlabs.substack.com/p/berkeley-researchers-replicat...


Whoah, that's incredible!

I remember a year ago I was hoping that in a decade from now it would be great to run GPT4-class models on my own hardware. The reality seems to be far more exciting.


I first sneered at the idea of LLM generated LLM training sets, but is this what might be driving the big efficiency leap?

Asking as someone who honestly only superficially followed the developments since the end of 2023 or so


You call R1 a small model? It's a 671-billion parameter model.

There are multiple variations of the model starting from 1.5B parameters.

Those are distillations of the model.

have you used those? in my experience even the 70B distillation is far worse than what you can expect from o1 / the R1 available on the web

No, I haven't. I've used Perplexity's R1 but I don't know how many parameters it has. It's quite good, although too slow.

All of the western AI companies trained on illegally obtained data, they barely even bother to deny it. This is an industry where lies are normalised. (Not to contradict your point about this specific number)

It's legally a grey area. It might even be fair use. Facts themselves are not protected by copyright. If there's no unauthorized reproduction/copying then it's not a copyright issue. (Maybe it's a violation of terms of services of course.)

> Facts themselves are not protected by copyright.

But don't LLMs encode language, not facts?

> If there's no unauthorized reproduction/copying then it's not a copyright issue.

I'm pretty sure copyright holders have gotten the models to regurgitate their copyright works verbatim, or nearly so.


We don't know what LLMs encode because we don't know what the model weights represent.

On the second point it depends how the models were made to reporduce text verbatim. If i copy-paste someone's article in MS word i technically made word reproduce the text verbatim., obviously that's not Word's fault. If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.


> If i copy-paste someone's article in MS word i technically made word reproduce the text verbatim., obviously that's not Word's fault. If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.

But that clearly means that the LLM already has the Bee Movie script inside it (somehow), which would be a copyright violation. If MS word came with an "open movie script" button that let you pick a movie and get the script for it, that would clearly be a copyright violation. Of course if the user inputs something then that's different - that's not the software shipping whatever it is.


That's not a fair comparison. The user in the word example already had access to the infringing content to copy it, and then paste it into word.

But it has to have that copy, verbatim, to produce it, as you acknowledge.

If dropbox was hosting and serving IP from paramount, paramount would be able to submit a DCMA request to get that data removed.

Not only can you not submit a DMCA request to chatGPT, they can't actually obey one.


> If i asked an LLM explicitly to list the entire Bee Movie script it would probably do it, which means it was trained on it, but that's through a direct and clear request to copy the original verbatim.

Huh? The "request" part doesn't matter. What you describe is exactly like if someone ships me a hard drive with a file containing "the entire Bee Movie script" that they were not authorized to copy: it's copyright infringement before and after I request the disk to read out the blocks with the file.


I mean, it is IP law, this stuff was all invented to help big corps support their business models. So, it is impossible to predict what any of it means until we see who is willing to pay more to get their desired laws enforced. We’ll have to wait for more precedent to be purchased before us little people can figure out what the laws are.

Copies are made in the formation of the training corpus and in the memory of the computers during training so there's definitely a copyright issue. Could be fair use though.

Is there also a copyright issue with search engines?

No, the DMCA amended the law to give search engines (and automated caches and user generated content sites) safe harbor from infringement if they follow the takedown protocol.

>been obtained illegally.

PRC companies breaking US export control laws is legal (for PRC companies). Maybe they're trying to avoid US entity listing, lot's of PRC companies keep mum about growing capabilites to do so. But the mere fact Deepseek is publicizing means they're unlikely to care about the political heat that is coming and the ramifications. If anything, getting on US entity list probably locks in their employees with Deepseek on resume into PRC.


Depending on how the law is written this may be legal even under US law.

For instance if the law bans US companies from exporting/selling some chips to Chinese companies and that's it then it is unclear to me whether a Chinese company would do anything illegal under US law by buying such chips as it would be for the American seller to refuse.

Anyway, usually this sort of things takes place through intermediaries in third countries so it is difficult to track but obviously it would be stupid to brag about it if that happened.


> PRC companies breaking US export control laws is legal

So long as they don't plan to do any business with the US or any of their allies I guess.


Which allies? The ones the current US president is threatening in all sorts of manner?

I actually hope he doubles down. I would love for EU to rely less on the US. It would also reduce the reach of the silly embargoes that benefit no one but the US.


The USA does not have allies. It has hostages.

Destabilizing world trade and international relations isn't something that anyone not named Trump or Putin should be hoping for.

Depends on how you think this would all play out.

Hard to think they plan to, PRC strategic companies that gets competitive gets entity listed anyway. And CEO seems mission driven for AGI - if US going to limit hardware inevitably then nothing to do but go gloves off, and try to dunk on competition. At this point US can take deep seek off appstores but what's the point except to look petty. Eitherway, more technical ppl have pointed out some of the R1 optimizations _only_ make sense if Deepseek was constrained to older hardware, i.e. engineer at PTX level to circumvent H800 limitations to perfrom more like H100s.

Throwing this model out also gives US allies soverign AI a launchpad... reducing US dependency is step 1 to not being US allies.


> Hard to think they plan to

They already are. You can make a paid account and use their API from most countries around the world. This is what doing business looks like.


This may not be that much of a moat, as Trump seems committed to turning US current allies into former allies.

If they sell software and build devices in China and then people from the US or our allies have to break our laws to import it, it seems like an us problem.

There are already some (limited) reproductions that suggest they're not completely lying (ie that there are indeed perf benefits).


four GPUs are very convincing indeed! :D

That's 8 (not 4), on a NVIDIA platform board to start with.

You can't buy them as "GPU"s and integrate them to your system. NVIDIA sells your the platform (GPUs + platform board which includes switches and all the support infra), and you integrate that behemoth of a board to your server, as a single unit.

So that open server and the wrapped ones at the back are more telling than it looks.


You missed the black unwrapped boxes in the background....

It's a very strange result.

I believe that NVIDIA is overvalued, but if DeepSeek really is as great as has been said, then it'll be even greater when scaled up to OpenAI sizes, and when you get more out you have more reason to pay, so this should if it pans out lead to more demand for GPUs-- basically Jevon's paradox.


If the top-tier premium GPUs aren't the difference-maker they were thought to be then that will hurt NVIDIA's margins, even if they make some of it up on volume.

You need to be prepared for the reality that naive scaling no longer works for LLMs anymore.

Simple question: where is GPT-5?


It is a possibility, but my understanding of what OpenAI has said is that GPT-5 is delayed because of the apparent promise of RL trained things like o1, etc. and that they've simply decided to train those instead of training a bigger base model training on better data, and I think this is plausible.

OpenAI has an incentive to make people believe that the scaling laws are still alive, to justify their enormous capex if nothing else.

I wouldn't give what they say to much credence, and will only believe the results I see.


Yes, I think I agree that it seems unlikely that the spending they're doing can be recouped.

But it can still make sense for a state, even if it doesn't make sense for investors though.


If we expect that the demand for GPT-5 in AI compute is 100x of that of GPT-4 then if GPT-4 was trained in months on 10k of H100 then you would need years with 100k of H100 or maybe again months with 100k of GB200.

See, there is your answer. The issue is the compute of GPUs is way to low yet for GPT-5 if they continue parameter scaling as they used to do.

GPT3 took months on 10k A100s. 10k H100 would have done it in a fraction of a time. Blackwell could train GPT4 in 10 days with same amount of GPUs as Hopper which took months.

Don't forget GPT3 is just 2.5 years old. Training is obviously waiting for the next step up in large clusters of training speed increasement. Don't be fooled, the 2x Blackwell vs. Hopper is only chip vs. chip. 10k of Blackwell including all networking speedup is easily 10x or more faster than the same amount of Hopper. So building a 1 million Blackwell cluster means 100x more training compute compared to a 100k Hopper cluster.

Nobody starts a model training if it takes years to finish... too much risk in that.

Transfer model was introduced in 2017 and ChatGPT came out 2022. Why? Because they would have needed millions of Volta GPUs instead of thousands of Ampere GPUs to train it.


There is a theory that Deepseek gives based on it's distillation process that hints towards, that o1 is really a distillation of a bigger GPT (GPT5?).

Some consider this to be spurious/conspiracy.


There is a big model from NVIDIA that I assume is for this purpose, i.e. Megatron 530b, so it doesn't sound too unreasonable.

Edit: I assumed that the model was distillation, that is apparently not true.


The architecture doesn't keep yielding better results, Jevon's paradox doesn't apply.

But surely it can be scaled up, or is this compression thing something making the approach good only for small models (I haven't read the Deepseek papers (can't allocate time to it))?

Anyhow, if you can deliver more with less, this is huge good news for AI industry.

After some readjustment we can expect AI companies to start using the new method to deliver more. Science fiction might happen sooner than expected.

Buy the dip.


The limit is high quality data, not compute.

RL doesn't need that much static data, it needs a lot of "good" tasks/challenges and computation.

Right and LLMs will not be able to generate their own high quality training data.

There are no perpetual motion machines.


> LLMs will not be able to generate their own high quality training data.

Humans certainly did. We did not inherit our physics and poetry books from some aliens.


Humans and LLMs are different things.

LLMs can not reason - many people seen to believe that they can.


I can't prove that we did but I don't know that we /didn't/.

LLMs are not humans, nowhere near.

Have you read about this specific model we're talking about?

My understanding is that the whole point of R1 is that it was surprisingly effective to train on synthetic data AND to reinforce on the output rather than the whole chain of thought. Which does not require so much human-curated data and is a big part of where the efficiency gain came from.


They already do. All the current leading edge models are heavily trained on synthetic data. It's called textbook learning.

> The limit is high quality data

If, as some companies claim, these models truly possess emergent reasoning, their ability to handle imperfect data should serve as a proof of that capability.


For Oracle (another Stargate recipient) it was reversion to the mean. For Nvidia, it's a big loss - I imagine they might have predicated their revenue based on the continued need for compute - and now that's in question.

This is not exactly right, they said they spent $6M on training V3, there aren't numbers out there related to the training of R1, I can feel it will be cheaper than o1, but it's hard to tell how much cheaper. I can guess that overall deepseek spent way less than openai to release the model, because I have the feeling that the R&D part was cheaper too, but we don't have the numbers yet. Anyway, we can assume that deepseek and Alibaba will try to get the most out of their current GPUs however.

The bigger correction will be in tech stocks that are overly exposed to datacenter investments to accommodate for ever rising AI demands. MSFT, AMZN, META they are all exposed

MSFT is down 7%

It's kind of silly. It's not like MSFT and the other hyper-scalers dont need the capacity build out for other reasons too. This should be an easy pivot if DeepSeek turns out to be as good as promised.

They were massively overhyped though, it feels more like a correction (and a partial one at that) than a fall.

Of course they are overhyped but in spite of this Altman is always asking for more money. And we know financially they are just burning money. So when someone finally brings a cheap but good model for the masses, this is where money should go. (This will also help all small AI startups.)

Not a good day for those who decided to hold 3x Nvidia ETPs - down 40% earlier.

Consider that the chinese might be misrepresenting their costs. A newsletter was implying that they might do it to undermine the sanctions justifications.

Agree that the AI bubble should pop though and the earlier, the better.


Their model is open and they published paper describing it https://arxiv.org/pdf/2412.19437 The can't be far off, or it would be noticed.

Even if they are heavily government subsidized for energy and hardware, I don see how the cost of training in the US would be more than double.


They express their cost in terms of GPU hours, then convert that to USD based on market GPU rental rates, so it's not affected by subsidies. It's possible however they lied about GPU hours, but if that was the case an expert should be able to show they lied by working out how many flops are needed to train based on the amount of tokens they say they used vs the flops of the GPUs they say they used.

Total training FLOPs can be deduced from model architecture (which they can't hide since they released weights) and how many tokens they trained on. With total training FLOPs and GPU hours you can calculate MFU. And the MFU of their deepseek-v3 train is around 40%, which sounds right. Both Google and Meta reported higher MFU. So the GPU hours should be correct. The only thing they could have lied is on how many tokens they trained the model on. DeepSeek reported 14T which is also similar to what Meta did so nothing crazy here.

tl;dr all numbers check up and the winnings come from the model architecture innovations they made.


Doesnt that say they based it on llama?, sooooo not really a bottoms up training - since the cost of llama is 100% surely not part of their quote.

I did a quick search for "llama" and didn't find anywhere they outright state they just fine-tuned some llama weights.

Is it possible that they based their model architecture on the llama model architecture? Rather than just fine-tuned already training llama weights? In that case, they'd still have to do "bottoms up" training.


People on the internet can lie. Especially when such a lie could cause the nasdaq to dip multiple percent points.

Not saying they are lying, but there incentives.


Much easier to identify the incentives of the people who just lost a lot of money who were betting on the idea that it was their money that was going to make artificial intelligence intelligent.

Everyone’s already begun trying this recipe in-house. Either it works with much less compute, or it doesn’t.

For instance, HKUST just did an experiment where small weak base models trained with DeepSeek’s method beat stronger small base models being trained with much more costly RL methods. Already this seems like it is enough to upend the low end models niche market, things like haiku and 4o-mini.

Be really skeptical why the people who should be making tons of money by realizing actually it was all a mirage and that they can now get the real stuff for even cheaper, would spend so much effort shouting about this, in order to undercut their own profitability..


Huggingface is reproducing it live on their blog…

Let's wait for reproduction first


> overtakes ChatGPT

That's arguable, though. I mean it's much cheaper and reasonably competitive which is almost the same but IMHO DeepSeek seems to get stuck in random loops and hallucinates more frequently than o1.


Let's see how the open vs close ecosystem wins in AI

Not sure if its as hot as it looks. Valuation of those private AIs is stupid. This looks like Chinese propaganda. Say we dont need that hardware, but it looks like they need it. https://wccftech.com/chinese-ai-lab-deepseek-has-50000-nvidi...

Taking out the popcorn early today.

[flagged]


The issue here is not that DeepSeek exists as a competitor to GPT, Claude, Gemini,...

The issue is that DeepSeek have shown that you don't need that much raw computing power to run an AI, which means that companies including OpenAI may focus more on efficiency than on throwing more GPUs at the problem, which is not good news for those in the business of making GPUs. At least according to the market.


One of the questions about this is that of the US’s human capital, i.e. does the US (still) have enough capable tech people in order to make that happen?

Lol, yes. The US is still very much at the forefront of this stuff. DeepSeek have presented some neat optimizations, but there have been many such papers and optimizations get implemented quickly once someone has proven them out.

> The US is still very much at the forefront of this stuff

Doesn't look like it, because the some of the biggest US tech companies now active (including Meta and Alphabet) couldn't come up with what this much-smaller Chinese company has. Which begs the question, what is that companies like Meta, Alphabet and the like do with the (already) hundreds of billions of dollars that they invested in this space?


Best guess is that they were all caught up in the arms race to try and make a better model, at whatever the cost. And if you work in this space you were probably getting thrown fistfuls of money to join in on it. I read somewhere on reddit that anyone trying to push for efficiency at these places was getting ignored or pushed aside. DeepSeek had an incentive to focus on efficiency because of the chip embargo. So I don't think this is necessarily a knock on US AI capabilities. It is just that the incentives were different and when stock prices are going to the moon regardless of how much capex was getting spent, it was easy for everyone to just go along with it.

With that said, I think all of these companies are capable of learning from this and implementing these efficiency improvement. And I think the arms race is still on. The goal is to achieve super human level of intelligence, and they have a ways to go to get there. It is possible that these new efficiency improvements might even help them take the next step as they can now do a lot more with a lot less.


I see no reason to believe they couldn't have done so. Rather, this is the typical pattern we see across industry: the west focuses on working out what the next big thing is, and China is in a fast-follow-and-optimize mode.

You can ban the company but are you going to ban any US company from using the open model and running it on their own hardware [1]?

The cat is out of the bag and there is no going back.

[1] https://apxml.com/posts/gpu-requirements-deepseek-r1


> You can ban the company but are you going to ban any US company from using the open model and running it on their own hardware [1]?

Just for the people who might not have been around the last time, this has precedent :) US government (and others) have been trying to outlaw (open source) cryptography, for various reasons, for decades at this point: https://en.wikipedia.org/wiki/Crypto_Wars


That statement is, at best, misleading.

The vast majority of what the US government has tried to ban was export of cryptography tools. However, as your own link makes clear, they stopped doing that in 2000.

Furthermore, what was restricted was not "open source cryptography"; it was cryptography that they could not break. The only way that open source comes into it is that that is what made it abundantly clear that the cat was out of the bag and there was no going back.

Hm. Kind of like this situation.


Please try to at least attempt to consider nuance. Do you seriously think that would happen? What is your point here? Do you think people in favor of restricting one thing are in favor of restricting everything?

People are trying to spur up “we shouldn’t use Chinese AI because our data is going to be stolen” discussions. But after TikTok debacle, no serious person is willing to bite. It’s just a big coping strategy for everyone who’s been saying how western AI is years ahead.

> Please try to at least attempt to consider nuance. Do you seriously think that would happen? What is your point here? Do you think people in favor of restricting one thing are in favor of restricting everything?

The restriction on TikTok was blatantly because it's a Chinese product outcompeting American products, everything else was the thinnest of smokescreens. Yes, I think people in favour of it are in favour of slapping whatever tariffs or bans they can get away with on everything that China makes.


The basics of representative democracy with different words.

Not "worked" but "he says".

Journos rarely checks out these UFO grifters, but when somebody does it all falls down. Take for example Luis Elizondo. He never worked for AATIP as he claims. Another UFO nut Senator Harry Reid just got him permission to hang around. The only thing that seems relatively sure was that Luis Elizondo was in ROC, maybe never passed.


You equating completely different things. Executive order is how president works.

The German ambassador talks about using government power to go against your domestic political enemies. Or using using lawsuits, threatening criminal prosecution and license revocation to prevent media speaking negatively about Trump.


Correct me if I'm wrong, but I don't see this being viable even if you reach your target efficiency.

The problem with hydrogen is the storage cost. Improving wire to to wire efficiency can help only so much. Have you calculated the electricity cost with those efficiency rates when you include the cost of storage? "Overall cost of renewable hydrogen in 2030 varies from €2.80–15.65/kgH2." improves with scale. https://www.sciencedirect.com/science/article/pii/S036031992...

Quick and dirty math, may contain errors:

Lightcell target is 0.5 kWh/L. Hydrogen weighs 0.09kg/L.

-> storage cost alone: ~ €0.5/kWh in large scale, €2.5/kWh in small scale.

Average electricity cost in the EU has been €0.289 per kWh.


> Average electricity cost in the EU has been €0.289 per kWh.

I'm curious where you're getting this from, and also what other Europeans on HN currently pay?

I'm in Spain with Octopus (via Spock's collective bargaining), and my effective price for December ended up being 0.131 EUR/kWh, while you claim a price that is 3x what I currently pay. Just wondering if I'm an outlier with the price Spock managed to get us.

Edit:

> The EU average price in the first half of 2024 — a weighted average using the most recent (2022) consumption data for electricity by household consumers — was €0.2889 per KWh.

https://ec.europa.eu/eurostat/statistics-explained/index.php...

Guessing that's your source :) Seems that's specific for home usage though, while your comment seems to be in a different context. Not sure electricity is cheaper/more expensive in industrial contexts.


Electricity prices for non-household consumers https://ec.europa.eu/eurostat/statistics-explained/index.php...

> The EU average price in the first half of 2024 was €0.1867 per KWh


I’m with Octopus in the UK (so not EU any more), on the Agile plan so it changes depending on wholesale prices. My average last month was £0.2061/kWh. Fixed tariffs are closer to £0.25/kWh.

Correct me if I'm wrong but I think this is apples and oranges: storage can be reused, while electricity is consumed.

That's the levelized cost over the lifetime. Hydrogen storage is expensive to both build and maintain.

The issues include hydrogen embrittlement, constant leakage and safety issues. Containers don't last. H2 is the smallest molecule. It gets into the containers and wears them out and leaks away. Casing and seal damage is constant. Pressure vessel storage loses little below 1% leakage per day.Liquid hydrogen storage is about 1-3% leakage per day. Salt cavern storage much less but they have problem of H2S generation by Micro-organisms.


> That's the levelized cost over the lifetime.

I don't see how you can compute that cost if you don't know anything about the amount of energy that goes into and out of the container, and how often that happens.


The article I'm quoting gives a range.

Larger for small storage, and longer term. Smaller for others.


It seems completely reasonable and sane.

Jan 19. is Sunday. Trump becomes president Jan 20. Biden understands that it's Trump's problem now. Filing stuff in the morning just to complicate any decisions Trump may take is just silly.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: