Qwen API Python Hands-On Tutorial: From Beginner to Production
A complete Alibaba Cloud Bailian platform Qwen3.7 API tutorial. From obtaining an API Key to chat, streaming output, Function Calling, vision understanding, and code generation — all in one place.
What This Tutorial Solves
You will learn:
- How to obtain a Qwen API Key on the Alibaba Cloud Bailian platform
- How to use the OpenAI SDK to call the full Qwen model family
- How to implement multi-turn conversations, streaming output, and Function Calling
- How to use Qwen-VL models for image analysis
- How to use Qwen-Coder models for code generation
🎯 Qwen is the most complete open-source and enterprise-deployment-friendly Chinese AI model. After reading this, you’ll be able to integrate it into any project.
Why Choose Qwen?
| Feature | Qwen 3.7 | Other Open-Source Models |
|---|---|---|
| Open-Source License | Apache 2.0 (most permissive) | DeepSeek MIT / Llama restrictive |
| Language Support | 119 languages | Typically 20–50 |
| Model Family | LLM + Vision + Audio + Code + Math + TTS | Usually LLM only |
| Cloud-Native | Deeply integrated with Alibaba Cloud / DingTalk | Requires self-deployment |
| API Price (Flash) | $0.07/M input | DeepSeek $0.14/M |
Step 1: Obtain an API Key
- Visit the Alibaba Cloud Bailian Console
- Complete real-name verification for your Alibaba Cloud account (requires a phone number)
- Go to “API-KEY Management” → Click “Create API Key”
- Copy the key prefixed with
sk-and save it
# macOS / Linux
export DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Windows PowerShell
$env:DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Step 2: Install the SDK
pip install openai
The Qwen API is compatible with the OpenAI protocol — just change the base_url:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
Optional Endpoints
| Region | Base URL |
|---|---|
| Domestic (Beijing) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| International (Singapore) | https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
Step 3: Your First API Call
response = client.chat.completions.create(
model="qwen-plus", # Best value for money
messages=[
{"role": "system", "content": "You are a professional programming assistant."},
{"role": "user", "content": "Write a binary search tree in Python."}
],
temperature=0.7,
max_tokens=2048,
)
print(response.choices[0].message.content)
print(f"Token usage: input {response.usage.prompt_tokens}, output {response.usage.completion_tokens}")
Step 4: Qwen Model Selection
Qwen has the most complete model family:
| Model | Best For | Value |
|---|---|---|
qwen-turbo | Simple chat, classification, summarization | ⭐⭐⭐⭐⭐ |
qwen-plus | Medium complexity, general tasks | ⭐⭐⭐⭐ |
qwen-max | Complex reasoning, high-quality generation | ⭐⭐⭐ |
qwen3.7-max | Strongest reasoning + coding agent | ⭐⭐ |
qwen3.7-plus | Multimodal understanding + high value | ⭐⭐⭐⭐ |
qwen-vl-plus | Image/video vision understanding | ⭐⭐⭐⭐ |
qwen3-coder-next | Code generation and programming assistance | ⭐⭐⭐ |
Step 5: Multi-Turn Conversation
# Initialize conversation history
conversation = [
{"role": "system", "content": "You are a helpful programming assistant."},
{"role": "user", "content": "Please introduce object-oriented programming in Python."}
]
# Round 1
resp1 = client.chat.completions.create(
model="qwen-turbo",
messages=conversation,
)
answer1 = resp1.choices[0].message.content
print(f"Round 1: {answer1}")
# Add AI's response to history
conversation.append({"role": "assistant", "content": answer1})
# Round 2 — AI remembers the previous conversation
conversation.append({
"role": "user",
"content": "How does it differ from object-oriented programming in Java?"
})
resp2 = client.chat.completions.create(
model="qwen-turbo",
messages=conversation,
)
print(f"Round 2: {resp2.choices[0].message.content}")
Step 6: Advanced Features — Thinking Mode + Web Search
Qwen 3.7 supports chain-of-thought reasoning and web search:
response = client.chat.completions.create(
model="qwen3.7-max",
messages=[
{"role": "user", "content": "Compare the strengths and weaknesses of Rust and Go for backend development in 2026."}
],
extra_body={
"enable_thinking": True, # Activate chain-of-thought reasoning
"enable_search": True, # Enable web search
},
stream=True,
)
for chunk in response:
# Normal response
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
# Reasoning process (thinking content)
if hasattr(chunk.choices[0].delta, "reasoning_content"):
rc = chunk.choices[0].delta.reasoning_content
if rc:
print(f"\n[Thinking]: {rc}")
Step 7: Function Calling
tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Query the weather for a specified city",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., Beijing, Shanghai, Tokyo"
}
},
"required": ["location"]
}
}
}]
messages = [{"role": "user", "content": "What's the weather in Hangzhou?"}]
response = client.chat.completions.create(
model="qwen-plus",
messages=messages,
tools=tools,
)
print(response.model_dump_json())
Step 8: Vision Understanding (Qwen-VL)
response = client.chat.completions.create(
model="qwen-vl-plus",
messages=[{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "https://example.com/chart.png"}
},
{
"type": "text",
"text": "What trend does this chart show? Please analyze in detail."
}
]
}]
)
print(response.choices[0].message.content)
Step 9: Code Generation (Qwen-Coder)
response = client.chat.completions.create(
model="qwen3-coder-next",
messages=[
{"role": "system", "content": "You are a professional systems programming expert."},
{"role": "user", "content": """Write a Rust function:
- Input: integer n
- Output: all prime numbers less than n
- Requirement: use the Sieve of Eratosthenes for optimal performance"""}
],
temperature=0.1, # Low temperature recommended for code generation
)
print(response.choices[0].message.content)
Step 10: Error Handling and Retry Logic (Production Essential)
In production, API calls will inevitably fail. Here’s a battle-tested retry pattern:
import time
from openai import (
OpenAI,
RateLimitError,
APITimeoutError,
APIConnectionError,
APIError,
)
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
timeout=30.0, # 30 seconds timeout
max_retries=2, # SDK built-in retries (handles 429/5xx)
)
def call_qwen_with_retry(messages, model="qwen-plus", max_retries=3):
"""Production-grade API call with exponential backoff."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=0.7,
max_tokens=2048,
)
return response.choices[0].message.content
except RateLimitError:
wait = 2 ** attempt # Exponential backoff: 1s, 2s, 4s
print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
except APITimeoutError:
print(f"Request timed out. Retrying (attempt {attempt + 1}/{max_retries})")
continue
except APIConnectionError:
wait = 2 ** attempt
print(f"Connection error. Retrying in {wait}s")
time.sleep(wait)
except APIError as e:
# Don't retry on 4xx client errors (your fault)
if e.status_code and 400 <= e.status_code < 500:
print(f"Client error (not retrying): {e}")
raise
# Retry on 5xx server errors (their fault)
wait = 2 ** attempt
print(f"Server error {e.status_code}. Retrying in {wait}s")
time.sleep(wait)
raise RuntimeError(f"API call failed after {max_retries} attempts")
# Usage
result = call_qwen_with_retry([
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
])
print(result)
Common Error Codes
| Status | Meaning | Action |
|---|---|---|
| 400 | Bad request — malformed parameters | Check your messages format and parameter values |
| 401 | Invalid API key | Verify DASHSCOPE_API_KEY is set correctly |
| 429 | Rate limit exceeded | Implement exponential backoff (see above) |
| 500 | Server error | Retry with backoff |
| 503 | Service overloaded | Wait 5-10s, then retry |
Step 11: Streaming with Error Handling
def stream_qwen_safe(messages, model="qwen-plus"):
"""Stream with automatic reconnection on failure."""
try:
stream = client.chat.completions.create(
model=model,
messages=messages,
stream=True,
timeout=60.0, # Longer timeout for streaming
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
yield delta.content
except (APIConnectionError, APITimeoutError) as e:
# Streaming reconnection: resume from where we left off
print(f"\n[Stream interrupted: {e}]")
# Reconstruct conversation with partial response and continue
raise # In production, implement proper resumption logic
Step 12: Real-World Project — CLI Chatbot in 50 Lines
Here’s a complete interactive chatbot you can run in your terminal:
#!/usr/bin/env python3
"""Minimal Qwen CLI chatbot — production-ready with error handling."""
import os, sys
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
timeout=30.0,
max_retries=2,
)
MODEL = os.getenv("QWEN_MODEL", "qwen-plus")
conversation = [
{"role": "system", "content": "You are a helpful assistant. Answer in Chinese unless asked otherwise."}
]
print(f"🤖 Qwen CLI Chatbot (model: {MODEL})")
print(" Type 'quit' to exit, 'clear' to reset conversation\n")
while True:
try:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == "quit":
print("Goodbye!")
break
if user_input.lower() == "clear":
conversation = [conversation[0]] # Keep system prompt
print("Conversation reset.\n")
continue
conversation.append({"role": "user", "content": user_input})
stream = client.chat.completions.create(
model=MODEL,
messages=conversation,
stream=True,
max_tokens=2048,
)
print("AI: ", end="", flush=True)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
text = chunk.choices[0].delta.content
print(text, end="", flush=True)
full_response += text
print("\n")
conversation.append({"role": "assistant", "content": full_response})
except KeyboardInterrupt:
print("\nGoodbye!")
break
except Exception as e:
print(f"\nError: {e}")
Step 13: Batch Processing for Cost Efficiency
When processing multiple items, batch them to reduce per-call overhead:
def batch_qwen(questions: list[str], model="qwen-turbo", max_per_batch=5):
"""Process multiple questions efficiently."""
results = []
for i in range(0, len(questions), max_per_batch):
batch = questions[i : i + max_per_batch]
# Combine questions into a single prompt
combined = "\n---\n".join(
f"Q{j+1}: {q}" for j, q in enumerate(batch)
)
prompt = f"""Answer each question below separately. Label each answer A1, A2, etc.
{combined}"""
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=500 * len(batch),
)
results.append(response.choices[0].message.content)
return results
# Process 20 questions in 4 batches instead of 20 separate calls
questions = ["What is Python?", "Explain async/await", "What is Docker?", "..."]
answers = batch_qwen(questions)
# Cost: 4 API calls instead of 20 = 80% savings
Qwen vs Other Chinese AI Models
| Dimension | Qwen | DeepSeek | Kimi |
|---|---|---|---|
| Model Variety | ⭐⭐⭐⭐⭐ Most complete | ⭐⭐ Text only | ⭐⭐⭐ |
| Open-Source License | ⭐⭐⭐⭐⭐ Apache 2.0 | ⭐⭐⭐⭐⭐ MIT | ⭐⭐⭐ Modified MIT |
| Coding Ability | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Enterprise Integration | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
| Multilingual | ⭐⭐⭐⭐⭐ 119 languages | ⭐⭐⭐ | ⭐⭐⭐ |
FAQ
Q: Does the Qwen API require a Chinese phone number?
A: Alibaba Cloud accounts require real-name verification, which typically requires a Chinese phone number. The international edition (Singapore endpoint) may support overseas registration — check the official policy for the latest details.
Q: Is Qwen truly open source?
A: Yes. Apache 2.0 is one of the most permissive open-source licenses in the industry, allowing commercial use, modification, and redistribution. It is more friendly than Meta’s Llama license.
Q: Should I choose Qwen or DeepSeek?
A: Choose DeepSeek for pure coding tasks. Choose Qwen if you need multimodality, enterprise integration, or multilingual support. See the Ultimate Model Comparison for details.
Next Steps
- 2026 Ultimate Comparison of Chinese AI Models
- Qwen Vision API Tutorial: Image Understanding in Practice
- Alibaba Cloud Bailian Official Documentation
📝 Tutorial version: Based on Qwen3.7 API, tested and verified on June 20, 2026.