Skip to content
CHINA TO WATCH
PT
IMG_REF: CHINESE-
Technology

Chinese open-source AI: how to use DeepSeek, Qwen, and GLM for free

person Phelipe Xavier schedule 8 min read calendar_today February 26, 2026
SIGNAL_INTERRUPT

While a large part of the Brazilian market still relies exclusively on GPT and Claude, China is distributing powerful AI models with open-source licenses—and you can run all of them for free, on your own computer, right now. No credit card, no waiting line, no reliance on foreign servers.

This guide is for developers and startups that want to understand what the main free Chinese open-source AI models are, how to install them locally, and when it makes sense to use each one.

The three open-source giants of China

Three Chinese labs dominate the open model landscape in 2025: DeepSeek, Alibaba Cloud (Qwen), and Zhipu AI (GLM). Each has different approaches and strengths.

DeepSeek R1

DeepSeek R1 is a reasoning model with 671 billion parameters (Mixture-of-Experts architecture), developed by Hangzhou-based startup DeepSeek, founded in 2023. The model was released under the MIT license—the most permissive possible—allowing commercial use, modification, and distillation to train other models.

The latest version, DeepSeek-R1-0528, showed significant improvements in reasoning, mathematics, and programming, with performance approaching models like O3 and Gemini 2.5 Pro, according to benchmarks published by DeepSeek itself.

In addition to the full 671B model, there are smaller distilled versions—from 1.5B to 70B parameters—that run on more modest hardware. The 8B parameter version (DeepSeek-R1-0528-Qwen3-8B) is the most accessible entry point.

Qwen 2.5 (Alibaba Cloud)

Qwen 2.5 is Alibaba Cloud's family of models, with versions ranging from 0.5B to 72 billion parameters. The 72B model has 80 layers, a context of 128K tokens, and native support for more than 29 languages—including Portuguese.

The technical highlights include improvements in code generation, mathematics, instruction following, and structured output (JSON). The license is Apache 2.0, also permissive for commercial use.

For those working with structured data, tables, or needing long responses (up to 8K tokens of output), Qwen 2.5 is probably the best option among the Chinese open models.

GLM-4 (Zhipu AI)

The GLM-4-9B is Zhipu AI's open model, a spin-off from Tsinghua University. With 9 billion parameters, it supports context up to 128K tokens and offers advanced features like web browsing, code execution, and function calling.

In benchmarks, the GLM-4-9B-Chat outperformed the Llama-3-8B-Instruct in almost all metrics: 72.4 on MMLU (versus 68.4 from Llama), 50.6 on MATH (versus 30.0), and 71.8 on HumanEval (versus 62.2). In function calling on the Berkeley Function Calling Leaderboard, it achieved 81.00 overall accuracy—almost identical to GPT-4 Turbo (81.24).

The big plus of GLM-4 is its support for 26 languages and exceptional performance on multilingual tasks, where it beat Llama-3 on all 6 tested datasets (M-MMLU, FLORES, MGSM, XWinograd, XStoryCloze, and XCOPA).

How to run for free on your computer

There are two main tools for running AI models locally: Ollama and LM Studio. Both are free and work on macOS, Windows, and Linux.

Method 1: Ollama (command line)

Ollama is the fastest way to get started. After installing from the site ollama.com, just open the terminal and run:

# DeepSeek R1 (8B version, ~5GB of RAM)
ollama run deepseek-r1

# Full DeepSeek R1 (671B, requires ~400GB of RAM)
ollama run deepseek-r1:671b

# Intermediate distilled versions
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b
ollama run deepseek-r1:70b

# Qwen 2.5 (versions from 0.5B to 72B)
ollama run qwen2.5
ollama run qwen2.5:14b
ollama run qwen2.5:72b

# GLM-4
ollama run glm4:9b

The download happens automatically on the first run. To update a downloaded model:

ollama pull deepseek-r1

Method 2: LM Studio (graphical interface)

If you prefer a visual interface, LM Studio allows you to search for, download, and chat with models without touching the terminal. It works like this:

  1. Download and install LM Studio
  2. In the "Discover" tab, search for "deepseek-r1", "qwen2.5", or "glm-4"
  3. Choose the version compatible with your RAM (the app shows the requirement for each model)
  4. Click "Download" and then go to the "Chat" tab

LM Studio also exposes a local API compatible with the OpenAI format, allowing you to integrate Chinese models into any application that already uses the OpenAI API—just switch the endpoint.

Minimum hardware requirements

ModelParametersMinimum RAMIdeal for
DeepSeek R1 (distilled 8B)8B6 GBLaptops, quick tests
Qwen 2.5 (14B)14B10 GBWorkstations, daily use
GLM-4-9B9B8 GBMultilingual tasks, function calling
DeepSeek R1 (32B)32B24 GBDedicated GPUs, light production
Qwen 2.5 (72B)72B48 GBServers, high quality
DeepSeek R1 (671B)671B~400 GBClusters, research

Price comparison: DeepSeek vs OpenAI

If running locally is not feasible, cloud APIs are the alternative. The price difference between Chinese and Western models is brutal.

Provider / ModelInput (per 1M tokens)Output (per 1M tokens)Source
DeepSeek V3.2 (cache hit)US$ 0.028US$ 0.42api-docs.deepseek.com, Feb/2026
DeepSeek V3.2 (cache miss)US$ 0.28US$ 0.42api-docs.deepseek.com, Feb/2026
GPT-5 mini (OpenAI)US$ 0.25US$ 2.00openai.com/api/pricing, Feb/2026
GPT-5.2 (OpenAI)US$ 1.75US$ 14.00openai.com/api/pricing, Feb/2026
GPT-5.2 Pro (OpenAI)US$ 21.00US$ 168.00openai.com/api/pricing, Feb/2026

In plain terms: DeepSeek V3.2's output costs US$ 0.42 per million tokens, while GPT-5.2 charges US$ 14.00 for the same volume—a 33x difference. Even compared to GPT-5 mini, DeepSeek is still almost 5x cheaper on output.

For a startup processing 10 million tokens of output per month, the bill would look like this:

  • DeepSeek: US$ 4.20/month (~R$ 25)
  • GPT-5 mini: US$ 20/month (~R$ 120)
  • GPT-5.2: US$ 140/month (~R$ 840)

Not to mention that DeepSeek offers automatic caching that reduces input costs by 10x when there are repetitive prompts—a common occurrence in chatbots and data pipelines.

Which model for what use case?

There is no universally "best model". The choice depends on what you are building:

Use caseRecommended modelWhy
Complex reasoning, mathematicsDeepSeek R1 (32B+)Designed for chain-of-thought, benchmarks close to O3
Code generationDeepSeek R1 or Qwen 2.5Both excellent in HumanEval; Qwen better in structured JSON
Multilingual chatbot (PT-BR)GLM-4-9B or Qwen 2.5Native support for Portuguese; GLM leads in multilingual benchmarks
Function calling / agentsGLM-4-9B81% on the Berkeley FCL, almost identical to GPT-4 Turbo
Tight budget (API)DeepSeek V3.2Up to 33x cheaper than GPT-5.2
Running on a laptopDeepSeek R1 8BSmallest distilled model, runs with 6GB of RAM
Structured data / tablesQwen 2.5Specific improvements in table understanding and JSON output

Fine-tuning: customizing Chinese models

One of the biggest advantages of open-source models is the possibility of fine-tuning—adapting the model to your specific data. Here's the basic path:

Step 1: Choose the base model

For fine-tuning, prefer smaller models (7B-14B). The computational cost scales with the number of parameters, and distilled models already come with good baseline capacity.

Step 2: Prepare your data

The standard format is JSONL with instruction/response pairs:

{"messages": [{"role": "user", "content": "What is the delivery time for SP?"}, {"role": "assistant", "content": "The standard delivery time for Sao Paulo is 2 business days."}]}

Step 3: Use fine-tuning tools

The most accessible options:

  • Unsloth: Python library that reduces memory usage by up to 60% during training. Supports DeepSeek and Qwen natively.
  • Axolotl: fine-tuning framework that abstracts complexity. Configuration via YAML.
  • LLaMA Factory: web interface for code-free fine-tuning. Supports LoRA, QLoRA, and full fine-tuning for all models mentioned here.

Step 4: Train with LoRA

LoRA (Low-Rank Adaptation) allows efficient fine-tuning without modifying all the model's weights. With a 24GB GPU (like RTX 4090), you can fine-tune models up to 14B parameters using QLoRA (LoRA quantized to 4 bits).

The investment? An RTX 4090 used in Brazil costs around R$ 8,000-10,000. In the cloud, an A100 on AWS or GCP costs between US$ 1-3/hour. A typical fine-tuning of a few hundred examples takes 1-4 hours.

Practical integration: replacing the OpenAI API

If you already have a project using the OpenAI API, migrating to Chinese models is surprisingly simple. Ollama exposes a local API in the OpenAI format:

# Start the model
ollama serve

# In another terminal, use it as if it were the OpenAI API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role": "user", "content": "Explain recursion in Python"}]
  }'

In your Python code, the change is two lines:

from openai import OpenAI

# Before (OpenAI)
# client = OpenAI(api_key="sk-...")

# After (Ollama local)
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

The same logic works with LM Studio (default port 1234) and with the DeepSeek cloud API (endpoint: api.deepseek.com).

Performance tips: getting the most out of local models

Running models locally requires some adjustments for a smooth experience. Some practical tips:

Quantization: 4-bit quantized models (Q4_K_M) offer the best balance between quality and memory consumption. Ollama downloads quantized versions by default, but in LM Studio, you can choose between Q4, Q5, and Q8—each level uses more RAM but delivers more accurate responses.

GPU offloading: If you have a GPU with enough VRAM, Ollama automatically transfers model layers to the GPU. On a Mac with Apple Silicon (M1/M2/M3/M4), unified memory allows running larger models than would be possible on PCs with a dedicated GPU of lower VRAM.

Context: Reducing the context window size saves memory. If you don't need the 128K token context of Qwen 2.5, configure it to 4K or 8K tokens—the model will run faster and use less RAM.

Batch processing: For batch tasks (text classification, data extraction), use Ollama's local API with parallel requests. A simple Python script with asyncio can process hundreds of documents per hour even on modest hardware.

What this means for the developer ecosystem

The availability of Chinese open-source models changes the game for developers in three ways:

  1. Data sovereignty: running models locally means no data leaves your server. For regulated sectors (health, finance, legal), this eliminates a huge barrier.
  2. Affordable cost: with APIs up to 33x cheaper and the option to run locally for free, projects that were financially unfeasible now fit any startup's budget.
  3. Geopolitical independence: relying on a single American provider is a risk. Having Chinese alternatives—with permissive licenses—creates strategic redundancy.

The AI landscape in 2025 is no longer an American duopoly. China is playing the open-source game aggressively, and those who know how to take advantage will have a real competitive advantage.

Next steps

If you want to get started now:

  1. Install Ollama and run ollama run deepseek-r1
  2. Try LM Studio if you prefer a graphical interface
  3. Test the DeepSeek API (US$ 0.28/1M input tokens) for production projects
  4. Follow China to Watch to stay up to date with the latest updates on these models

Chinese open-source AI is no longer a niche curiosity—it's infrastructure. And it's available to any developer who wants to use it.

每周研究

Get the weekly research

Every week, a deep analysis about China. Free.

Weekly research. 100% free.

More articles

← View all articles