MiniMax API Complete Guide: The Lightweight Multimodal AI Choice
Complete MiniMax (Xiyu Technology) API tutorial: Python SDK calls, ChatCompletion Pro, speech synthesis TTS, video generation. Covers MiniMax-Text-01 and Speech-02 model hands-on examples.
What This Tutorial Solves
You will master the complete usage of the MiniMax (Xiyu Technology) API:
- MiniMax-Text-01 LLM invocation
- Speech synthesis (TTS) — multiple voices and emotions
- Video generation (Video-01)
- Python SDK hands-on
- Integration with other Chinese AI models
🎯 MiniMax was founded by Yan Junjie, former VP of SenseTime, and is a major player in Chinese multimodal AI. Its TTS capabilities are top-tier in Chinese speech synthesis, with a clean and developer-friendly API.
Step 1: Understanding MiniMax
| Product | Type | Core Capability | Pricing |
|---|---|---|---|
| MiniMax-Text-01 | Language Model | Conversation, writing, coding | Very low (far below GPT) |
| Speech-02 | Speech Synthesis | TTS, 20+ voices, emotions | Per-character billing |
| Speech-01 | Speech Recognition | ASR, Chinese-English mixed | Per-minute billing |
| Video-01 | Video Generation | Text-to-video / image-to-video | Per-second billing |
| Music-01 | Music Generation | AI composition | Per-track billing |
Step 2: Getting an API Key
- Visit the MiniMax Open Platform and register
- Complete verification -> Create an API Key
- Set the environment variable:
export MINIMAX_API_KEY="your-api-key"
Step 3: Installation and Basic Usage
pip install openai
MiniMax is fully compatible with the OpenAI SDK:
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("MINIMAX_API_KEY"),
base_url="https://api.minimaxi.com/v1",
)
response = client.chat.completions.create(
model="MiniMax-Text-01",
messages=[
{"role": "system", "content": "你是一个创意写作助手"},
{"role": "user", "content": "写一个关于AI与人类友谊的短篇故事,200字"}
],
temperature=0.8,
max_tokens=1024,
)
print(response.choices[0].message.content)
Step 4: Function Calling
import json
tools = [{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "Search the knowledge base for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"top_k": {"type": "integer", "description": "Number of results", "default": 3}
},
"required": ["query"]
}
}
}]
def search_knowledge_base(query: str, top_k: int = 3) -> list:
"""Simulated knowledge base search"""
docs = [
{"title": "Product Manual", "score": 0.95, "content": f"Product info about '{query}'..."},
{"title": "FAQ", "score": 0.87, "content": f"FAQ about '{query}'..."},
{"title": "Technical Docs", "score": 0.76, "content": f"Technical details about '{query}'..."},
]
return docs[:top_k]
def ask_with_knowledge(question: str) -> str:
messages = [{"role": "user", "content": question}]
response = client.chat.completions.create(
model="MiniMax-Text-01",
messages=messages,
tools=tools,
tool_choice="auto",
)
msg = response.choices[0].message
if not msg.tool_calls:
return msg.content
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
result = search_knowledge_base(**args)
messages.append(msg)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result, ensure_ascii=False)
})
final = client.chat.completions.create(
model="MiniMax-Text-01",
messages=messages,
)
return final.choices[0].message.content
print(ask_with_knowledge("如何配置API访问权限?"))
Step 5: Speech Synthesis (TTS)
MiniMax’s TTS is its most outstanding capability:
import requests
def text_to_speech(text: str, voice: str = "male-qn-qingse") -> str:
"""Text-to-speech"""
url = "https://api.minimaxi.com/v1/t2a_v2"
headers = {
"Authorization": f"Bearer {os.getenv('MINIMAX_API_KEY')}",
"Content-Type": "application/json",
}
payload = {
"model": "speech-02-hd",
"text": text,
"voice_setting": {
"voice_id": voice,
"speed": 1.0,
"vol": 1.0,
"pitch": 0,
"emotion": "happy", # Emotion: happy/sad/angry/fearful
},
"audio_setting": {
"sample_rate": 32000,
"format": "mp3",
},
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
# Save audio
import base64
audio_bytes = base64.b64decode(data["data"]["audio"])
filename = f"tts_output.mp3"
with open(filename, "wb") as f:
f.write(audio_bytes)
return filename
# Generate speech
audio_file = text_to_speech(
"你好!我是MiniMax的AI语音助手。今天天气不错,适合出门走走。",
voice="male-qn-qingse"
)
print(f"Audio file: {audio_file}")
Available Voices
| Voice ID | Type | Style |
|---|---|---|
male-qn-qingse | Male | Clear and gentle |
female-shaonv | Female | Lively young girl |
female-yujie | Female | Mature and confident |
male-qn-jingying | Male | Elite business professional |
Step 6: Video Generation
def create_video(prompt: str, duration: int = 5) -> str:
"""Text-to-video"""
url = "https://api.minimaxi.com/v1/video_generation"
headers = {
"Authorization": f"Bearer {os.getenv('MINIMAX_API_KEY')}",
"Content-Type": "application/json",
}
payload = {
"model": "video-01",
"prompt": prompt,
"duration": duration,
"aspect_ratio": "16:9",
}
response = requests.post(url, headers=headers, json=payload)
task_id = response.json()["task_id"]
# Poll until completion
import time
for _ in range(60):
time.sleep(10)
status_resp = requests.get(
f"{url}?task_id={task_id}",
headers=headers
)
status = status_resp.json()
if status.get("status") == "completed":
return status["video_url"]
raise TimeoutError("Video generation timed out")
# Generate video
video_url = create_video(
"A peaceful Chinese garden with koi fish swimming in a pond, cherry blossoms falling"
)
print(f"Video: {video_url}")
Step 7: Cost and Optimization
| Service | Pricing |
|---|---|
| Text-01 Input | ¥1 / million tokens |
| Text-01 Output | ¥3 / million tokens |
| Speech-02 TTS | ¥2 / thousand characters |
| Video-01 | ¥10 / second |
New MiniMax users receive free credits. Text-01 pricing is approximately 1/5 of GPT-5.
Comparison with Other Models
| Dimension | MiniMax | DeepSeek | iFlytek |
|---|---|---|---|
| TTS Quality | 5/5 | 2/5 | 5/5 |
| Language Model | 3/5 | 5/5 | 4/5 |
| Video Generation | 4/5 | — | — |
| API Pricing | Low | Very low | Medium |
MiniMax is positioned as a lightweight multimodal solution. For speech/video, choose MiniMax; for text reasoning, choose DeepSeek.
FAQ
Q: Is MiniMax suitable for production?
A: Very mature for speech synthesis and video generation. For pure text conversation, DeepSeek or Qwen is recommended.
Q: What emotions does TTS support?
A: Happy, sad, angry, fearful, neutral. Each emotion significantly affects the expressiveness of the voice.
Next Steps
📝 Based on MiniMax’s latest API as of June 2026. All code tested and verified.