deepseek-chat and deepseek-reasoner will be deprecated on 2026-07-24 15:59 UTC. This guide walks through migrating to deepseek-v4-flash and deepseek-v4-pro.
From V4 release to July 24 legacy API deprecation — the three-month migration window is closing.
After this date, any request using deepseek-chat or deepseek-reasoner model names will fail. Currently they map to V4-Flash non-thinking and thinking modes respectively.
Replace old names with new ones. Function mapping below:
| Old name (deprecated) | New name | Capability mapping |
|---|---|---|
deepseek-chat |
deepseek-v4-flash |
Non-thinking mode (default) |
deepseek-reasoner |
deepseek-v4-flash (with thinking enabled) |
Thinking mode (add extra_body) |
deepseek-chat (high-load) |
deepseek-v4-pro |
Upgrade to Pro for flagship performance |
Flash vs Pro core difference: Flash is lightweight (284B/13B) with 2500 concurrency; Pro is flagship (1.6T/49B) with 500 concurrency but stronger coding and reasoning. Use Flash for daily Q&A; Pro for agent coding and long-chain reasoning.
Good news: migration doesn't require new domains or rewriting network layers.
| Interface format | Base URL | Notes |
|---|---|---|
| OpenAI 兼容 | https://api.deepseek.com |
SDK auto-appends /v1/chat/completions |
| Anthropic 兼容 | https://api.deepseek.com/anthropic |
SDK auto-appends /v1/messages |
In other words, migration work = change model parameter + adapt thinking mode parameters (if coming from deepseek-reasoner). Everything else stays.
Minimal change: only the model parameter.
Before
# 迁移前 from openai import OpenAI client = OpenAI( api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com" ) response = client.chat.completions.create( model="deepseek-chat", # ❌ 旧名 messages=[{"role": "user", "content": "Hello"}], )
After
# 迁移后 from openai import OpenAI client = OpenAI( api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com" # 不变 ) response = client.chat.completions.create( model="deepseek-v4-flash", # ✅ 新名 messages=[{"role": "user", "content": "Hello"}], )
Before
# 迁移前(如果你之前用第三方中转接 DeepSeek) # Claude Code settings.json { "env": { "ANTHROPIC_BASE_URL": "https://api.deepseek.com/anthropic", "ANTHROPIC_API_KEY": "<你的 Key>" } }
After
# 迁移后:Claude Code 直接用 V4 { "env": { "ANTHROPIC_BASE_URL": "https://api.deepseek.com/anthropic", "ANTHROPIC_API_KEY": "<你的 Key>", "ANTHROPIC_MODEL": "deepseek-v4-flash" // 或 deepseek-v4-pro } }
Old deepseek-reasoner was thinking-mode by default. Migration to V4-Flash requires explicit enable. reasoning_effort controls intensity.
V4 thinking mode does not support temperature, top_p, presence_penalty, or frequency_penalty. Setting them doesn't error but has zero effect on output.
from openai import OpenAI client = OpenAI( api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com" ) response = client.chat.completions.create( model="deepseek-v4-flash", messages=[{"role": "user", "content": "9.11 and 9.8, which is greater?"}], reasoning_effort="high", # high / max extra_body={"thinking": {"type": "enabled"}}, ) # 思考过程在 reasoning_content,最终答案在 content reasoning = response.choices[0].message.reasoning_content answer = response.choices[0].message.content
# Anthropic 兼容端点 { "model": "deepseek-v4-flash", "thinking": {"type": "enabled"}, "output_config": {"effort": "high"} }
V4 has a multi-turn detail: without tool calls, previous reasoning_content doesn't need to be passed back (API ignores it); with tool calls, it must be fully passed back, otherwise a 400 error.
# 推荐写法:直接 append 整个 message 对象,reasoning_content 自动包含 messages.append(response.choices[0].message) # 而不是手动复制 content / reasoning_content / tool_calls 字段
V3's 128K ceiling jumps to 1M tokens in V4 (about a Romance of the Three Kingdoms in volume). This dramatically changes long-document analysis and whole-repo parsing workflows — no more chunking required.
| Metric | DeepSeek V3 | DeepSeek V4-Flash | DeepSeek V4-Pro |
|---|---|---|---|
| Context window | 128K | 1M | 1M |
| Max output | 8K | 384K | 384K |
| MRCR retrieval accuracy | ~50% | 83.5% | 83.5% |
1M context application tests → long-context page.
V4 pricing structure is similar to V3 but more granular — first time pricing split by cache hit / miss.
| Model | Input (cache hit) | Input (cache miss) | Output |
|---|---|---|---|
| deepseek-v4-flash | $0.0028 / MTok | $0.14 / MTok | $0.28 / MTok |
| deepseek-v4-pro | $0.003625 / MTok | $0.435 / MTok | $0.87 / MTok |
Source: DeepSeek official API docs, 2026-07-05. Cache hit applies to repeated prefix requests (multi-turn, agent loops). Flash concurrency limit 2500; Pro 500.
Check off each item to ensure migration completes before July 24.
No. V4 uses your existing DeepSeek API key. If you already use deepseek-chat, your balance, key, and access carry over directly.
Just change the model parameter, no base_url, SDK, or network changes needed. Switch takes effect instantly, no deployment window. If issues arise, temporarily rolling back to deepseek-chat still works until 7-24.
Local deployment doesn't relate to DeepSeek's official API model name deprecation — what model name you use depends entirely on your inference server. This guide targets the DeepSeek official API.
V3.2 (Speciale) is a 2025 mid-version, not on this deprecation list — but DeepSeek stopped V3.2 new-user registration earlier. Recommend migrating to V4-Flash or V4-Pro soon to enjoy 1M context and thinking mode.
Flash (284B / 13B) is lightweight, concurrency 2500, low price, suited for daily Q&A and batch tasks. Pro (1.6T / 49B) is flagship, concurrency 500, stronger coding and reasoning, suited for agent coding and long-chain reasoning. When in doubt, start with Flash — it's enough for most scenarios.