Kimi Beginner 2026/6/19

Kimi API Getting Started: 5-Minute Integration with 256K Ultra-Long Context

Complete Moonshot AI Kimi K2.6 API integration guide. From obtaining an API Key to multimodal vision recognition, Function Calling, and Agent cluster usage, with code examples for every step.

KimiAPIPythonBeginnerLong Context

What This Tutorial Covers

You will learn:

How to register a Moonshot AI Kimi account and obtain an API Key
How to call the Kimi K2.6 API in Python
How to process ultra-long documents (up to 256K context)
How to use multimodal vision to analyze images
How to configure Function Calling for tool invocation

🎯 After this tutorial, you’ll be able to process book-length documents with Kimi.

Why Choose Kimi?

Kimi (Moonshot AI) stands out in long-document processing:

Feature	Kimi K2.6	GPT-5	Claude Opus 4
Context window	256K tokens	256K	200K
Coding (SWE-Bench)	58.6%	55.2%	52.1%
API input price	$0.60/M tokens	$3.00/M	$15.00/M
Max parallel agents	300 sub-agents	Limited	Limited
Open-source	⚠️ Modified MIT	❌	❌

Kimi K2.6 surpassed GPT-5.4 on SWE-Bench Pro, making it one of the strongest open-source coding models.

Step 1: Get an API Key

Open platform.moonshot.cn (International users: platform.moonshot.ai)
Register an account (supports international phone numbers)
Go to “API Key Management” and create a new key
Copy the key and save it as an environment variable:

# macOS / Linux
export MOONSHOT_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Windows PowerShell
$env:MOONSHOT_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

💡 New user perk: Free credits on signup (typically 15-30 RMB), enough for development and testing.

Step 2: Install the SDK

The Kimi API is also compatible with the OpenAI SDK:

pip install openai

Step 3: Your First API Call

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MOONSHOT_API_KEY"),
    base_url="https://api.moonshot.cn/v1",  # China endpoint
    # base_url="https://api.moonshot.ai/v1",  # International endpoint
)

response = client.chat.completions.create(
    model="kimi-k2.6",  # Latest flagship model
    messages=[
        {"role": "system", "content": "You are a senior Python engineer."},
        {"role": "user", "content": "Write a quicksort algorithm in Python."}
    ],
    temperature=0.3,
    max_tokens=2048,
)

print(response.choices[0].message.content)

📝 Endpoint notes: China-based users use api.moonshot.cn; international users use api.moonshot.ai. The API format is identical -- only the server location differs.

Step 4: Streaming Output

stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "user", "content": "Explain Python's GIL (Global Interpreter Lock) in detail."}
    ],
    temperature=0.3,
    max_tokens=2048,
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Step 5: Processing Ultra-Long Documents

Kimi’s biggest advantage is its 256K ultra-long context — you can load an entire book or a complete code repository at once:

# Read a very long document
with open("long_report.txt", "r", encoding="utf-8") as f:
    long_document = f.read()

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "system", "content": "You are a professional document analyst. Read the entire content carefully before answering user questions."},
        {"role": "user", "content": f"""Please analyze the following document:

<document>
{long_document}
</document>

Complete the following tasks:
1. Summarize the core content in 3 sentences
2. List 5 key findings
3. Point out any data inconsistencies in the document"""}
    ],
    temperature=0.2,
    max_tokens=4096,
)

print(response.choices[0].message.content)

Long Document Processing Best Practices

Technique	Description
Use structured prompts	Clearly tell the AI what task to perform
Step-by-step questioning	Have the AI summarize first, then dive into specifics
Leverage XML tags	Wrap content in `<document>` tags for improved parsing accuracy
Control output length	Set an appropriate `max_tokens` to avoid excessively long responses

Step 6: Multimodal Vision Recognition

Kimi K2.5+ supports image understanding — it can analyze charts, screenshots, and photos:

import base64

# Read image and convert to base64
with open("architecture_diagram.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="kimi-k2.5",  # Vision tasks require K2.5
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this system architecture diagram and identify potential performance bottlenecks and improvement suggestions."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_b64}"}
            }
        ]
    }],
    max_tokens=2048,
)

print(response.choices[0].message.content)

⚠️ Note: Multimodal vision tasks use the kimi-k2.5 model, not kimi-k2.6. Also, content must use the list format (not a plain string).

Step 7: Function Calling

tools = [{
    "type": "function",
    "function": {
        "name": "search_database",
        "description": "Search for information in the database.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search keyword"},
                "limit": {"type": "integer", "description": "Maximum number of results"}
            },
            "required": ["query"]
        }
    }
}]

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "Search for the latest materials on quantum computing for me"}],
    tools=tools,
    tool_choice="auto",
)

# Handle tool calls
msg = response.choices[0].message
if msg.tool_calls:
    for tool_call in msg.tool_calls:
        print(f"Calling function: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

Agent Cluster Mode (Advanced)

Kimi K2.6 supports launching up to 300 sub-agents working in parallel:

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{
        "role": "user",
        "content": """I need a comprehensive analysis of a software project. Please complete the following tasks simultaneously:
1. Analyze code quality
2. Check for security vulnerabilities
3. Evaluate performance bottlenecks
4. Review API design
5. Check test coverage
Please analyze each dimension independently, then provide a summary."""
    }],
    temperature=0.3,
    max_tokens=4096,
)

# Kimi will automatically decompose complex tasks into sub-agents for parallel processing

Step 8: File API — Upload and Process Documents

Kimi’s File API lets you upload documents for persistent processing across sessions:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MOONSHOT_API_KEY"),
    base_url="https://api.moonshot.cn/v1",
)

# Step 1: Upload a file
with open("research_paper.pdf", "rb") as f:
    file_obj = client.files.create(
        file=f,
        purpose="file-extract",  # Extract text from PDF/Word/PPT/Excel
    )

file_id = file_obj.id
print(f"File uploaded: {file_id}")

# Step 2: Check file status
file_status = client.files.retrieve(file_id)
print(f"Status: {file_status.status}, Size: {file_status.bytes} bytes")

# Step 3: Use the file in a conversation
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "system",
            "content": f"file id: {file_id}\n\nRead and analyze the uploaded document carefully."
        },
        {"role": "user", "content": "Summarize the key findings of this research paper."}
    ],
    temperature=0.2,
    max_tokens=2048,
)

print(response.choices[0].message.content)

# Step 4: Clean up (optional — files auto-expire after 7 days)
client.files.delete(file_id)

Supported File Formats

Category	Formats	Max Size
Documents	PDF, DOCX, TXT, MD	100 MB
Spreadsheets	XLSX, CSV	100 MB
Presentations	PPTX	100 MB
Images	PNG, JPG, WEBP	20 MB
Code	.py, .js, .ts, .java, .go, .rs, etc.	10 MB

Step 9: Production Error Handling

import time
from openai import (
    OpenAI,
    RateLimitError,
    APITimeoutError,
    APIConnectionError,
    APIError,
)

def call_kimi_safely(messages, model="kimi-k2.6", max_retries=3):
    """Kimi API call with intelligent retry."""
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=0.3,
                max_tokens=4096,
                timeout=60.0,
            )

        except RateLimitError:
            wait = min(2 ** attempt * 5, 60)  # Kimi rate limits can be strict
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)

        except APITimeoutError:
            # Long documents may need more time
            print(f"Timeout — Kimi processing large context. Retrying with longer timeout...")
            continue

        except APIConnectionError:
            time.sleep(2 ** attempt)
            continue

        except APIError as e:
            if e.status_code and e.status_code == 413:
                raise RuntimeError(
                    "Content too large. Split your document into smaller chunks "
                    "or use Kimi's File API for large files."
                )
            if e.status_code and 400 <= e.status_code < 500:
                raise  # Client error — don't retry
            time.sleep(2 ** attempt)

    raise RuntimeError(f"Kimi API call failed after {max_retries} attempts")

Step 10: Real-World Document QA Pipeline

A complete pipeline for answering questions about long documents:

class KimiDocumentQA:
    """Production-grade document Q&A using Kimi's 256K context."""

    def __init__(self):
        self.client = OpenAI(
            api_key=os.environ.get("MOONSHOT_API_KEY"),
            base_url="https://api.moonshot.cn/v1",
        )

    def load_document(self, filepath: str) -> str:
        """Load and prepare a document for Kimi."""
        # For large PDFs/DOCXs, use the File API (Step 8)
        if filepath.endswith(('.pdf', '.docx', '.pptx', '.xlsx')):
            with open(filepath, "rb") as f:
                file_obj = self.client.files.create(
                    file=f,
                    purpose="file-extract",
                )
            # Wait for processing
            for _ in range(10):
                status = self.client.files.retrieve(file_obj.id)
                if status.status == "processed":
                    return f"[FILE_ID:{file_obj.id}]"
                time.sleep(1)
            raise TimeoutError("File processing timed out")

        # For text/markdown files, read directly
        with open(filepath, "r", encoding="utf-8") as f:
            return f.read()

    def ask(self, document: str, question: str) -> dict:
        """Ask a question about a document and get structured answer."""
        messages = [
            {
                "role": "system",
                "content": (
                    "You are a professional document analyst. "
                    "Always cite specific sections/paragraphs in your answers. "
                    "If uncertain, state your confidence level (High/Medium/Low)."
                ),
            },
            {
                "role": "user",
                "content": f"""<document>{document}</document>

Question: {question}

Answer format:
1. Direct answer (1-2 sentences)
2. Supporting evidence (quote from document)
3. Confidence level (High/Medium/Low)""",
            },
        ]

        response = call_kimi_safely(messages)
        return {
            "answer": response.choices[0].message.content,
            "tokens_used": response.usage.total_tokens,
            "cost_estimate": f"~¥{response.usage.total_tokens * 2 / 1_000_000:.4f}",
        }

# Usage
qa = KimiDocumentQA()
doc = qa.load_document("annual_report.txt")
result = qa.ask(doc, "What was the revenue growth rate in Q3?")
print(f"Answer: {result['answer']}")
print(f"Cost: {result['cost_estimate']}")

Step 11: Context Window Optimization

Kimi’s 256K context is powerful but expensive. Optimize for cost:

def estimate_tokens(text: str) -> int:
    """Rough token count (Chinese: ~1.5 chars/token, English: ~4 chars/token)"""
    chinese_chars = sum(1 for c in text if '一' <= c <= '鿿')
    other_chars = len(text) - chinese_chars
    return int(chinese_chars / 1.5 + other_chars / 4)

def optimize_context(document: str, max_tokens: int = 100_000) -> str:
    """Truncate document to fit within budget while preserving key content."""

    tokens = estimate_tokens(document)
    if tokens <= max_tokens:
        return document

    # Strategy: keep beginning (executive summary) + end (conclusions)
    # Middle sections are often detailed methodology that can be summarized
    ratio = max_tokens / tokens
    keep_chars = int(len(document) * ratio)

    # Keep first 40% (summary/intro) + last 20% (conclusions)
    head_chars = int(keep_chars * 0.67)
    tail_chars = keep_chars - head_chars

    return (
        document[:head_chars]
        + f"\n\n[... {tokens - max_tokens:,} tokens trimmed for budget ...]\n\n"
        + document[-tail_chars:]
    )

# Example: Process a 500K-token document on a 100K-token budget
long_doc = open("massive_report.txt").read()
optimized = optimize_context(long_doc, max_tokens=100_000)
print(f"Original: {estimate_tokens(long_doc):,} tokens → Optimized: {estimate_tokens(optimized):,} tokens")

Strategy	Cost Impact	Quality Impact
Remove boilerplate	-15% cost	None
Use File API for large docs	-30% cost	None
Truncate to relevant sections	-50% cost	Low (if done carefully)
Use kimi-k2-lite for simple queries	-75% cost	Medium (simpler tasks only)

Kimi vs Other Chinese AI Models

Model	Context	Best Use	Price
`kimi-k2.6`	256K	Coding, agents, long document analysis	Standard
`kimi-k2.5`	256K	General multimodal, vision recognition	Standard
`kimi-k2-thinking`	256K	Complex math reasoning	Slightly higher
`kimi-coding/k2p5`	—	Dedicated code generation (no rate limits)	Separate billing

⚠️ Important: Kimi's general API and Kimi Coding API are two independent services. API Keys and endpoints cannot be mixed!

FAQ

Q: What’s the difference between the China and international editions?

A: The API format is completely identical; only the server location differs. China users: api.moonshot.cn; international users: api.moonshot.ai. The international edition may require different payment methods.

Q: Can the full 256K context actually be used?

A: Yes. K2.6 maintains high-precision retrieval even at 256K context. However, we recommend structuring documents (using Markdown, XML tags) for best results.

Q: Should I choose Kimi or DeepSeek?

A: Choose Kimi for long document processing; choose DeepSeek for programming tasks. Most projects can combine both. See the China AI Models Ultimate Comparison for details.

Next Steps

📝 Tutorial Version Notes: Based on Kimi K2.6 API, tested and verified on June 19, 2026.

What This Tutorial Covers

Why Choose Kimi?

Step 1: Get an API Key

Step 2: Install the SDK

Step 3: Your First API Call

Step 4: Streaming Output

Step 5: Processing Ultra-Long Documents

Long Document Processing Best Practices

Step 6: Multimodal Vision Recognition

Step 7: Function Calling

Agent Cluster Mode (Advanced)

Step 8: File API — Upload and Process Documents

Supported File Formats

Step 9: Production Error Handling

Step 10: Real-World Document QA Pipeline

Step 11: Context Window Optimization

Kimi vs Other Chinese AI Models

FAQ

Q: What’s the difference between the China and international editions?

Q: Can the full 256K context actually be used?

Q: Should I choose Kimi or DeepSeek?

Next Steps

相关教程