Xiaomi Beginner

Xiaomi MiMo API Getting Started: 1M Ultra-Long Context + Coding Agent Guide

Complete Xiaomi MiMo V2.5 Pro API tutorial: multiple access paths, thinking mode, long-text processing, Agent development. 1M context, Claude Sonnet-level coding ability, priced at 1/5 of GPT-5.5.

XiaomiMiMoAPIAgentOpen SourceLong Context

What This Tutorial Solves

You will master the complete usage of the Xiaomi MiMo large model:

  • Three API access paths (Alibaba Cloud Bailian / Token Plan / third-party proxies)
  • Correct use of Thinking mode
  • 1M ultra-long context processing (contract analysis, long document RAG)
  • Open-source model local deployment with vLLM
  • Agent framework integration (Claude Code / Codex / OpenCode)

🎯 Xiaomi MiMo is led by “post-95 AI prodigy” Luo Fuli (formerly a core developer of DeepSeek-V2). MiMo-V2-Flash achieves 73.4% on the SWE-Bench coding benchmark, rivaling Claude Sonnet, and is open-sourced under the MIT license.


Step 1: Understanding the MiMo Model Family

Xiaomi began releasing the MiMo series in 2025, positioned as the AI infrastructure for the “people-car-home” ecosystem:

Core Models

ModelArchitectureTotal ParamsActive ParamsContextKey Feature
MiMo-V2.5-ProMoE1.02T42B1MLatest flagship, deep reasoning
MiMo-V2.5MoE Multimodal310B15B1MText + image + video + audio
MiMo-V2-FlashMoE309B15B256KMIT open-source, SOTA coding
MiMo-V2-OmniMoE Multimodal256KFull-modal understanding
MiMo-V2.5-TTSSpeech SynthesisEmotion control, dialects
MiMo-V2.5-ASRSpeech RecognitionChinese-English bilingual, noise-robust

Key Metrics

  • SWE-Bench Verified: 73.4% (open-source #1, rivaling Claude Sonnet)
  • Inference Speed: Approximately 3x that of DeepSeek-V3.2
  • Long Context Retrieval: 95%+ accuracy at 800K tokens depth
  • Pricing: Input $1.00/M tokens (5x cheaper than GPT-5.5)

Step 2: Obtaining API Access

There are currently three ways to access MiMo:

Method A: Alibaba Cloud Bailian Platform (Mainland China only)

Access Xiaomi-direct models through Alibaba Cloud Bailian. Requires an API Key for the China North 2 (Beijing) region.

API Base URL:

https://dashscope.aliyuncs.com/compatible-mode/v1

Getting a Key:

  1. Visit the Alibaba Cloud Bailian Console
  2. Select the China North 2 (Beijing) region
  3. Enable the “Xiaomi MiMo” model service
  4. Create an API Key

Method B: Xiaomi Token Plan Platform (Globally available)

Official subscription platform launched in 2026, with nodes in three regions:

RegionOpenAI-Compatible Endpoint
China (CN)https://token-plan-cn.xiaomimimo.com/v1
Singapore (SGP)https://token-plan-sgp.xiaomimimo.com/v1
Europe (AMS)https://token-plan-ams.xiaomimimo.com/v1

Registration: https://platform.xiaomimimo.com/console/plan-manage

Method C: Third-Party Platforms

PlatformBase URLModel Name
OpenRouterhttps://openrouter.ai/api/v1xiaomi/mimo-v2.5-pro
WaveSpeedAIhttps://llm.wavespeed.ai/v1xiaomi/mimo-v2.5
Vercel AI GatewayUse Vercel SDKmimo-v2.5-pro

💡 Platform recommendation: Mainland China users should use Alibaba Cloud Bailian (low latency). Overseas users should use Token Plan or OpenRouter.


Step 3: Basic API Calls

OpenAI-Compatible Format (Universal method)

pip install openai
from openai import OpenAI
import os

# Choose base_url based on your platform
client = OpenAI(
    api_key=os.getenv("MIMO_API_KEY"),
    base_url="https://token-plan-cn.xiaomimimo.com/v1",  # Token Plan CN
)

response = client.chat.completions.create(
    model="mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "你是一个专业的编程助手"},
        {"role": "user", "content": "用Python写一个二分查找算法,包含注释"}
    ],
    temperature=1.0,  # MiMo recommends 1.0
    max_tokens=2048,
)

print(response.choices[0].message.content)

Alibaba Cloud Bailian Format

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

response = client.chat.completions.create(
    model="xiaomi/mimo-v2.5-pro",
    messages=[{"role": "user", "content": "解释什么是Transformer架构"}],
    extra_body={"enable_thinking": True},  # Thinking mode passed via extra_body
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Step 4: Thinking Mode (Critical!)

MiMo’s thinking mode is similar to DeepSeek’s but has some important differences:

Enabling Thinking

response = client.chat.completions.create(
    model="mimo-v2.5-pro",
    messages=[
        {"role": "user", "content": "证明:√2 是无理数"}
    ],
    max_tokens=4096,
    extra_body={"enable_thinking": True},  # Non-standard parameter, passed via extra_body
    stream=True,
)

reasoning = ""
answer = ""
is_answering = False

for chunk in response:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta

    # Process reasoning chain
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        reasoning += delta.reasoning_content
        if not is_answering:
            print(f"[Thinking] {delta.reasoning_content}", end="", flush=True)

    # Process final response
    if delta.content:
        if not is_answering:
            is_answering = True
            print("\n\n[Answer] ", end="")
        print(delta.content, end="", flush=True)

Thinking Mode Considerations

PointDescription
Multi-turn conversationsEach assistant message must retain the reasoning_content field
max_tokensLimits the combined length of chain-of-thought + output
Incompatible parametersreasoning_effort, thinking_budget are not supported
Recommended frameworksUse tools like OpenCode that properly handle reasoning_content
⚠️ Critical pitfall: In multi-turn conversations, if the previous turn's `reasoning_content` is lost, the API will error. Make sure to keep it intact in the messages array.

Step 5: 1M Ultra-Long Context Processing

MiMo-V2.5-Pro supports 1M tokens of context, ideal for processing ultra-long documents:

def analyze_long_document(document_path: str, question: str) -> str:
    """Analyze ultra-long documents (contracts, papers, codebases)"""
    with open(document_path, "r", encoding="utf-8") as f:
        content = f.read()

    # If the document is too long, process in segments
    # MiMo's 1M context can directly handle most long documents

    response = client.chat.completions.create(
        model="mimo-v2.5-pro",
        messages=[
            {
                "role": "system",
                "content": "你是一个专业的文档分析助手。请基于提供的文档内容回答问题。"
            },
            {
                "role": "user",
                "content": f"文档内容:\n\n{content}\n\n问题:{question}"
            }
        ],
        temperature=1.0,
        max_tokens=4096,
    )

    return response.choices[0].message.content

# Example usage
answer = analyze_long_document(
    "contract.txt",
    "这份合同中有哪些条款对乙方不利?请逐一列出。"
)
print(answer)

Long Context Use Cases

ScenarioDescription
Contract reviewAnalyze hundreds of pages of contracts at once
Codebase understandingUse the entire project’s code as context
Paper analysisCompare and analyze multiple papers simultaneously
Customer service KBFull product documentation as RAG context

Step 6: Function Calling (Tool Use)

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search the company's internal database",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search keyword"},
                    "limit": {"type": "integer", "description": "Number of results", "default": 5}
                },
                "required": ["query"]
            }
        }
    }
]

def search_database(query: str, limit: int = 5) -> list:
    """Simulated database search (connect to real DB in production)"""
    # Connect to your database here
    results = [
        {"title": f"Document: {query}-related-{i}", "score": 0.95 - i * 0.1}
        for i in range(limit)
    ]
    return results

def agent_query(user_input: str) -> str:
    messages = [{"role": "user", "content": user_input}]

    response = client.chat.completions.create(
        model="mimo-v2.5-pro",
        messages=messages,
        tools=tools,
        tool_choice="auto",
    )

    msg = response.choices[0].message

    if not msg.tool_calls:
        return msg.content

    for tool_call in msg.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)

        if func_name == "search_database":
            result = search_database(**func_args)

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result, ensure_ascii=False)
        })

    final = client.chat.completions.create(
        model="mimo-v2.5-pro",
        messages=messages,
    )

    return final.choices[0].message.content

print(agent_query("公司上个季度的销售数据如何?"))

Step 7: Local Deployment of Open-Source Models

MiMo-V2-Flash is fully open-sourced under the MIT license and can be used commercially for free:

Using vLLM

pip install vllm
# Start the service
vllm serve "XiaomiMiMo/MiMo-V2-Flash" --port 8000
# Client-side call
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="XiaomiMiMo/MiMo-V2-Flash",
    messages=[{"role": "user", "content": "什么是Rust的所有权系统?"}],
)

Open-Source Resources

ResourceURL
GitHub Organizationhttps://github.com/XiaomiMiMo
HuggingFacehttps://huggingface.co/XiaomiMiMo
MiMo-V2-Flash (MIT)GitHub / HuggingFace
vLLM SupportOne-click deployment

Step 8: Integration with AI Coding Tools

Claude Code (via MiMo2API)

The open-source project MiMo2API provides an Anthropic Messages API compatibility layer:

git clone https://github.com/Fly143/MiMo2API.git
cd MiMo2API
chmod +x deploy.sh
./deploy.sh

Once started, Claude Code can use MiMo as its backend.

Codex CLI

Edit ~/.codex/config.toml:

model_provider = "xiaomi"
model = "mimo-v2.5-pro"

[model_providers.xiaomi]
name = "Xiaomi MiMo"
base_url = "https://token-plan-cn.xiaomimimo.com/v1"
env_key = "MIMO_API_KEY"

Cline / OpenClaw

Select Xiaomi -> MiMo models in the model settings.


Parameter Quick Reference

ParameterMiMo RangeDefaultNotes
temperature0 ~ 1.51.0Recommended 1.0; too low weakens reasoning
top_p0.01 ~ 1.00.95Nucleus sampling
max_tokensCounts both chain-of-thought + output
presence_penalty-2 ~ 20MiMo-specific
stopUp to 4 sequencesStop words
toolsSupportedOnly tool_choice: auto
enable_thinkingbooltruePass via extra_body

Unsupported Parameters

top_k, reasoning_effort, parallel_tool_calls, seed, logprobs, n, audio


Pricing (June 2026)

Pay-as-You-Go

Billing ItemV2.5-ProV2.5 StandardV2-Flash
Input (cache miss)$1.00/M$0.14/M~$0.10/M
Output$3.00/M$0.28/M~$0.40/M
Input (cache hit)$0.20/M$0.0028/M$0.02/M

Token Plan Subscription

PlanChina PriceInternational PriceCredits/Month
Lite¥39$6~4.1B points
Standard¥99$16~11B points
Pro¥329$50~38B points

Price Comparison with Major Models

ModelInput ($/M)Output ($/M)Context
MiMo V2.5 Pro$1.00$3.001M
DeepSeek V4-Pro$0.44$0.87128K
GPT-5.5$5.00$30.00200K
Claude Opus 4.7$3.00$15.00200K

In 1M long-context scenarios, MiMo’s price advantage is especially pronounced.


FAQ

Q: MiMo vs DeepSeek — how to choose?

A: Choose MiMo for long-context tasks (1M vs 128K), DeepSeek for deep reasoning and math. Both are top-tier for coding.

Q: Can overseas users call the API directly?

A: Yes. Use Token Plan’s Singapore or Europe nodes, or go through platforms like OpenRouter.

Q: Can the open-source model be used commercially?

A: MiMo-V2-Flash is under the MIT license and is fully available for commercial use. Check the specific licenses for other models.

Q: How many additional tokens does thinking mode consume?

A: The thinking process counts toward the max_tokens limit but is not billed separately (only final output is charged).


Next Steps

📝 Tutorial version notes: Based on Xiaomi MiMo’s latest API version as of June 2026. All code tested and verified.

Advertisement