DeepSeek: a $6 million Chinese AI

In January 2025, a Hangzhou-based company with 160 employees did something that no one in Silicon Valley thought was possible: trained an artificial intelligence model comparable to OpenAI's GPT-4 for $6 million. OpenAI spent over $100 million to train GPT-4. DeepSeek did it with less than a tenth of the computing power that Meta used for Llama 3.1.

The result? Nvidia lost $600 billion in market value in a single day — the largest drop for a company in the history of the American stock market. Analysts called the episode the "Sputnik moment" for artificial intelligence.

But how did a Chinese startup manage to do this? And what does it change for those working with technology?

Who is DeepSeek (and where did the money come from)

DeepSeek did not come out of nowhere. It is the offspring of High-Flyer, a Chinese hedge fund founded in 2016 by Liang Wenfeng, a graduate of Zhejiang University who began operating in the financial market during the 2008 crisis. High-Flyer specialized in algorithmic trading using deep learning, and by 2021 was already operating 100% with AI.

Liang realized early on that he needed heavy computational power. In 2019, he invested 200 million yuan (~$28 million) in his first computing cluster, with 1,100 GPUs. In 2021, before the US restricted the sale of advanced chips to China, he bought about 10,000 Nvidia A100 GPUs.

In July 2023, High-Flyer created DeepSeek as an independent company, focused exclusively on AI research — with no connection to the fund's financial business. The company operates with a lean team: just 160 people, recruited from the best Chinese universities. Many come from areas outside traditional computer science, which broadens the knowledge base of the models.

$6 million vs $100 million: how is it possible?

The number that shocked the market: DeepSeek claims to have trained its V3 model for $6 million. OpenAI's GPT-4 cost over $100 million. This 17x difference is not magic — it's engineering.

DeepSeek used a technique called Mixture of Experts (MoE), where the model does not activate all its parameters for each query. Think of it this way: instead of turning on all the lights in a building to find a room, you only light up the right corridor. This drastically reduces computational costs without losing quality in the responses.

Additionally, the company trained its models using export-grade Nvidia chips — less powerful versions than those used in the US, created precisely to comply with American sanctions. They used fewer units and extracted more from each one. The Fire-Flyer 2 cluster, built in 2021 with a budget of 1 billion yuan, operated with a utilization rate above 96%.

For the R1 model specifically, DeepSeek combined MoE with proprietary parallelism optimizations that allowed training giant models on GPUs that, on paper, should not have been able to handle the task.

Performance: DeepSeek R1 vs GPT-4 vs Claude

The benchmarks tell a surprising story. The DeepSeek-R1, launched in January 2025, delivers responses comparable to OpenAI's GPT-4 and the o1 in reasoning tasks. The model was published under the MIT license — meaning the code is open-source.

Here is the comparison that matters for those deciding which AI to use:

Criterion	DeepSeek R1	GPT-4 (OpenAI)	Claude 3.5 (Anthropic)
Training cost	~$6M	~$100M	Not disclosed
License	MIT (open)	Proprietary	Proprietary
API (cost per 1M tokens input)	$0.14 (cache hit)	$30 (GPT-4)	$3 (Sonnet)
Logical reasoning	Comparable to o1	Benchmark	Strong
Open-source	Yes (open weights)	No	No

The price difference in the API is brutal. For a company that processes millions of tokens per day — chatbots, document analysis, automation — DeepSeek could represent savings of 90% or more compared to GPT-4.

The market earthquake: $600 billion evaporated

When the DeepSeek-R1 was launched and the cost numbers became public, Wall Street went into panic mode. The logic was simple: if it's possible to train top-tier models for 17x less, the thesis that AI requires billion-dollar investment in hardware falls flat. And who sells that hardware? Nvidia.

On January 27, 2025, Nvidia lost $600 billion in market value in a single trading session. For perspective: that's more than Argentina's GDP. It was the largest capitalization drop for a single company in American market history.

The message was clear: the arms race for GPUs may not be the only path to advanced AI. Algorithmic efficiency may be worth more than brute computational force.

Analysts and international media described the moment as an "AI Sputnik" — referencing the shock the US experienced in 1957 when the Soviet Union launched the first satellite. This time, the shock came from China.

What DeepSeek changes for developers

DeepSeek opens doors that were previously locked by cost. Some practical ways to use it:

1. Cheap API for production. The DeepSeek API charges from $0.14 per million tokens (with cache). For startups spending thousands per month on OpenAI's API, partially migrating to DeepSeek could cut the bill by 80-90%. The endpoint is compatible with OpenAI's format, so the technical migration is straightforward.

2. Run locally. Since the model weights are open (MIT license), you can download smaller versions of DeepSeek and run them on your own infrastructure. Tools like Ollama and LM Studio already support DeepSeek models. For those dealing with sensitive data — fintechs, healthtechs — this solves the problem of sending data to external servers.

3. Fine-tuning without spending a fortune. With open models, it's possible to adjust DeepSeek for specific tasks: legal text, invoice analysis, customer service with local context. The cost of fine-tuning an open model is a fraction of what you pay to customize proprietary models.

4. Free chatbot. The chat.deepseek.com works for free, with no apparent message limit. For freelancers and small businesses using ChatGPT Plus ($20/month), trying DeepSeek as an alternative is zero risk.

The risks nobody wants to discuss

DeepSeek is not perfect, and those who adopt it need to go in with eyes wide open.

Censorship. The model follows Chinese government restrictions. Ask about Tiananmen, Taiwan, or Xinjiang, and the responses will be evasive or blocked. For general commercial applications this may not matter, but for editorial or educational products, it's a real limitation.

Data in China. If you use the API hosted by DeepSeek, your data passes through Chinese servers. For companies with compliance requirements, running the model locally is the way out — and the fact that it's open-weight makes this feasible.

Sustainability of the business model. DeepSeek is funded by a hedge fund. It doesn't charge high subscription fees and has no venture capital investor pressure. That's great for now, but it raises the question: how long will the price stay this low? The bet is that the company is playing the long game, prioritizing adoption over immediate profit.

Geopolitics. With the US-China trade war intensifying, it's not impossible that Western governments will create restrictions on the use of Chinese models. Italy has already temporarily blocked DeepSeek's chatbot. Companies that rely exclusively on DeepSeek would be exposed to this regulatory risk.

What's ahead

DeepSeek has shown that the race for AI is not won with money alone. The combination of creative engineering, a lean team, and access to hardware — even limited hardware — produced results that Silicon Valley did not see coming.

The lesson is clear: the cost of using advanced AI has dropped drastically, and high-quality open models now exist. Whoever is still stuck on the idea that you need to pay a premium for good AI is leaving money on the table.

The question is no longer "Is Chinese AI good enough?". It is: "How long are you going to pay 10x more for equivalent results?"

This topic was featured in China to Watch, the daily newsletter on what's happening in China before it becomes news elsewhere. Subscribe at chinato.watch.

DeepSeek: how a $6 million Chinese startup shook Silicon Valley