ZhipuAI Beginner 2026/6/20

Zhipu GLM-4 API Complete Guide: From Zero to Production AI Apps (2026 Latest)

Comprehensive guide to ZhipuAI GLM-4.7/GLM-4.6V API: three calling methods, thinking mode, Function Calling, web search, vision understanding, video generation. Complete Python code examples.

ZhipuGLM-4PythonFunction CallingVisionOpen Source

What This Tutorial Covers

You will master the complete ZhipuAI GLM-4 series API:

Three API calling methods (official SDK, OpenAI compatibility, direct HTTP)
Thinking mode for deep reasoning
Function Calling to build AI agents
Web search for real-time information
GLM-4.6V multimodal vision understanding
CogVideoX-2 video generation
Local deployment of open-source models

🎯 ZhipuAI is the only company in China with a full product line covering language + vision + video + real-time multimodal. GLM-4.7 achieves world-class coding ability (SWE-Bench 73.8%).

Step 1: Meet the ZhipuAI Model Family

ZhipuAI, founded by the Tsinghua team, has China’s most complete large model product lineup:

Language Models

Model	Context	Max Output	Price (input/output per M tokens)	Positioning
GLM-4.7	200K	128K	¥4 / ¥16	Latest flagship
GLM-4.7-Flash	200K	128K	Free	Free flagship
GLM-4.6	200K	128K	Paid	Ultra performance
GLM-4-Plus	128K	4K	¥5 / ¥5	Previous-gen flagship
GLM-4.5	128K	96K	¥0.8 / ¥2	MoE open-source (355B/32B)
GLM-4-Flash	128K	16K	Free	Free lightweight

Multimodal Models

Model	Type	Context	Price (input/output)
GLM-4.6V	Vision understanding	128K	¥1 / ¥3
GLM-4.6V-Flash	Vision (lightweight)	128K	Free
GLM-4V-Flash	Vision free tier	—	Free
CogView-3-Plus	Text-to-image	—	Paid
CogVideoX-2	Text-to-video	—	Paid (4K/60fps+audio)
GLM-Realtime	Real-time multimodal	—	Paid

💡 Model selection advice: Start with the free glm-4.7-flash for initial development, upgrade to glm-4.7 or glm-4.6 for production as needed.

Step 2: Get Your API Key

Visit the ZhipuAI Open Platform and sign up
Complete real-name verification (Chinese phone number)
Create an API Key in the console
Set the environment variable:

export ZAI_API_KEY="Your-API-Key"

New users receive free credits and can begin testing immediately.

Step 3: Method 1 — Official New SDK (Recommended)

Install

pip install zai-sdk

Python >= 3.8 required.

Basic Chat

from zai import ZhipuAiClient

client = ZhipuAiClient(api_key="Your-API-Key")

response = client.chat.completions.create(
    model="glm-4.7-flash",  # Free model, great for beginners
    messages=[
        {"role": "system", "content": "You are a professional Python programming assistant"},
        {"role": "user", "content": "Write a quicksort algorithm in Python and explain its time complexity"}
    ],
    temperature=0.7,
    max_tokens=2048,
)

print(response.choices[0].message.content)

Streaming Output (Typewriter Effect)

response = client.chat.completions.create(
    model="glm-4.7-flash",
    messages=[{"role": "user", "content": "Describe China's AI industry in 200 words"}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Step 4: Thinking Mode (Deep Thinking)

Thinking mode allows the model to reason deeply before outputting, ideal for math problems, logic puzzles, and code debugging:

response = client.chat.completions.create(
    model="glm-4.7-flash",
    messages=[
        {"role": "user", "content": "In a room of 10 people, if everyone shakes hands with everyone else exactly once, how many handshakes occur? Reason step by step."}
    ],
    temperature=0.7,
    max_tokens=8192,
    thinking={"type": "enabled"},  # 🔑 Enable deep thinking
)

print(response.choices[0].message.content)

Example thinking mode output:

Thinking process: This is a combination problem. 10 people, each pair shakes hands once.
Equivalent to choosing 2 people from 10, order doesn't matter.
Combination count = C(10,2) = 10 × 9 ÷ 2 = 45.

Answer: 45 handshakes in total.

💡 When to use thinking mode? Math proofs, code debugging, logical reasoning, complex analysis. Skip it for simple conversations -- it only adds latency.

Step 5: Function Calling (Building AI Agents)

This is the core capability for building AI applications — letting AI call your functions to fetch real-time data or perform actions.

Complete Example: Building an AI Weather Assistant

from zai import ZhipuAiClient
import json

client = ZhipuAiClient(api_key="Your-API-Key")

# 1. Define tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Query today's weather for a specified city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g., Beijing, Shanghai, Shenzhen"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["city"]
        }
    }
}]

# 2. Simulated weather query function (use a real API in production)
def get_weather(city: str, unit: str = "celsius") -> dict:
    weather_db = {
        "Beijing": {"temp": 22, "condition": "Sunny", "humidity": 45},
        "Shanghai": {"temp": 28, "condition": "Cloudy", "humidity": 65},
        "Shenzhen": {"temp": 32, "condition": "Thunderstorms", "humidity": 80},
        "Hangzhou": {"temp": 26, "condition": "Overcast", "humidity": 70},
    }
    data = weather_db.get(city, {"temp": 20, "condition": "Unknown", "humidity": 50})
    if unit == "fahrenheit":
        data["temp"] = data["temp"] * 9 / 5 + 32
    data["city"] = city
    return data

# 3. Build the agent function
def chat_with_agent(user_query: str) -> str:
    messages = [{"role": "user", "content": user_query}]

    # First call: AI decides if tools are needed
    response = client.chat.completions.create(
        model="glm-4.7-flash",
        messages=messages,
        tools=tools,
        tool_choice="auto",
    )

    msg = response.choices[0].message

    # If no tools needed, return directly
    if not msg.tool_calls:
        return msg.content

    # Handle tool calls
    for tool_call in msg.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)

        print(f"🔧 AI calling tool: {func_name}({func_args})")

        if func_name == "get_weather":
            result = get_weather(**func_args)

        # Return result to AI
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result, ensure_ascii=False)
        })

    # Second call: AI generates response based on results
    final_response = client.chat.completions.create(
        model="glm-4.7-flash",
        messages=messages,
    )

    return final_response.choices[0].message.content

# Test
print(chat_with_agent("What's the weather like in Beijing today?"))
print(chat_with_agent("Which is hotter, Shenzhen or Hangzhou?"))

Step 6: Web Search

Let AI access real-time information rather than relying solely on training data:

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "What's the latest AI industry news in June 2026?"}
    ],
    tools=[{
        "type": "web_search",
        "web_search": {
            "search_query": "June 2026 AI industry news",
            "search_result": True,
        }
    }],
)

print(response.choices[0].message.content)

⚠️ Note: Web search is only available on certain paid models (e.g., glm-4.7, glm-4.6). Free models do not currently support it.

Step 7: Vision Understanding (GLM-4V Multimodal)

Upload images for AI analysis:

from zai import ZhipuAiClient
import base64

client = ZhipuAiClient(api_key="Your-API-Key")

# Convert image to base64
def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

base64_image = encode_image("photo.jpg")

response = client.chat.completions.create(
    model="glm-4.6v",  # Vision model
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Please describe this image in detail, including the scene, people, colors, and atmosphere"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
            }
        ]
    }],
    temperature=0.5,
    max_tokens=2048,
)

print(response.choices[0].message.content)

GLM-4V Unique Capabilities

4K ultra-high-res image understanding: Supports high-resolution images
Variable resolution processing: Dynamically adjusts based on image complexity
2-hour video understanding: Supports long video content analysis
Native multimodal Function Calling: Images/screenshots can be used directly as tool inputs

Step 8: Method 2 — OpenAI-Compatible Interface

ZhipuAI is fully compatible with the OpenAI SDK, enabling seamless switching for existing OpenAI projects:

pip install openai

from openai import OpenAI

client = OpenAI(
    api_key="Your-Zhipu-API-Key",
    base_url="https://open.bigmodel.cn/api/paas/v4/"
)

# Usage identical to OpenAI
response = client.chat.completions.create(
    model="glm-4-plus",
    messages=[
        {"role": "system", "content": "You are a talented fiction writer"},
        {"role": "user", "content": "Write a 200-word micro-fiction story"}
    ],
    temperature=0.9,
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

This dramatically reduces migration cost — just change two lines of configuration!

Step 9: Method 3 — Direct HTTP REST API

No SDK needed, direct HTTP calls:

import requests

url = "https://open.bigmodel.cn/api/paas/v4/chat/completions"
headers = {
    "Authorization": "Bearer Your-API-Key",
    "Content-Type": "application/json",
}

data = {
    "model": "glm-4.7-flash",
    "messages": [
        {"role": "user", "content": "Explain artificial intelligence in three sentences"}
    ],
    "temperature": 0.7,
    "max_tokens": 512,
}

response = requests.post(url, headers=headers, json=data)
result = response.json()
print(result["choices"][0]["message"]["content"])

Suitable for environments where Python packages cannot be installed (e.g., certain serverless platforms).

Step 10: Video Generation (CogVideoX-2)

Zhipu’s video generation model supports up to 4K/60fps:

from zai import ZhipuAiClient

client = ZhipuAiClient(api_key="Your-API-Key")

# Submit a video generation task
response = client.videos.generations(
    model="cogvideox-2",
    prompt="A cinematic drone shot flying over a bustling Chinese tech city at sunset, neon lights reflecting on glass skyscrapers, 4K quality",
    quality="quality",
    with_audio=True,
    size="1920x1080",
    fps=30,
)

print(f"Task submitted, ID: {response.id}")

# Query task result
result = client.videos.retrieve_videos_result(id=response.id)
print(f"Video URL: {result.video_url}")

Error Handling Best Practices

from zai import ZhipuAiClient, ZhipuAiError

client = ZhipuAiClient(api_key="Your-API-Key")

def safe_chat(prompt: str, retries: int = 3) -> str:
    """Safe chat function with retries and error handling"""
    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model="glm-4.7-flash",
                messages=[{"role": "user", "content": prompt}],
                temperature=0.7,
                max_tokens=2048,
            )
            return response.choices[0].message.content

        except ZhipuAiError as e:
            if "rate_limit" in str(e).lower():
                wait = 2 ** attempt  # Exponential backoff
                print(f"Rate limited, retrying in {wait}s...")
                import time
                time.sleep(wait)
            elif "invalid_api_key" in str(e).lower():
                raise Exception("Invalid API Key, please check") from e
            else:
                raise

    raise Exception(f"Failed after {retries} retries")

print(safe_chat("Hello"))

Deploying Open-Source Models (Local)

Zhipu has open-sourced several model versions for local deployment:

Using Transformers

pip install transformers torch

from transformers import pipeline

# Load GLM-4.6V vision model
pipe = pipeline(
    "image-text-to-text",
    model="Mitchins/GLM-4.6V-Flash-NVFP4-BF16Vision"
)

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "url": "https://example.com/photo.jpg"},
        {"type": "text", "text": "What animal is in this image?"}
    ],
}]

result = pipe(text=messages)
print(result)

Production Deployment with vLLM

# Server side
vllm serve zai-org/GLM-4.5V \
    --tensor-parallel-size 4 \
    --tool-call-parser glm45 \
    --reasoning-parser glm45 \
    --port 8000

# Client side (OpenAI compatible)
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="glm-4.5v",
    messages=[{"role": "user", "content": "Hello, introduce yourself"}],
)

Cost Optimization Strategies

Free Models First

Scenario	Recommended Free Model
Daily chat	`glm-4.7-flash`
Image understanding	`glm-4.6v-flash` / `glm-4v-flash`
Simple coding	`glm-4-flash`
Lightweight Q&A	`glm-4.5-air` (¥0.8/M tokens)

Cost Reduction Tips

Use free models for dev/testing, switch to paid models before launch
Reuse System Prompts: Don’t send the system prompt with every message
Set max_tokens appropriately: Avoid wasting output tokens
Leverage caching: Same context hits can reduce costs

Pricing Quick Reference (June 2026)

Model	Input (¥/M tokens)	Output (¥/M tokens)
GLM-4.7	4	16
GLM-4.7-Flash	0	0
GLM-4.6	Paid	Paid
GLM-4-Plus	5	5
GLM-4.5	0.8	2
GLM-4.5-Air	0.8	2
GLM-4-Flash	0	0
GLM-4.6V	1	3
GLM-4.6V-Flash	0	0

💡 Zhipu also offers a Coding Plan (¥20/month), compatible with coding tools like Claude Code and Cline, with 3x the usage of Claude Pro.

Quick Comparison: GLM vs. DeepSeek vs. Qwen

Dimension	GLM-4.7	DeepSeek V4	Qwen3
Context	200K	128K	128K
Max output	128K	8K	8K
Open-source	MIT	Fully open	Fully open
Multimodal	Vision+Video+Realtime	Text primarily	Vision
Free models	✅ Multiple	✅	✅
Coding	SWE-Bench 73.8%	Strong	HumanEval+ 83.5%

FAQ

Q: Can I switch between Zhipu API and OpenAI API directly?

A: Yes. Just change base_url and api_key — no code changes needed. Zhipu is fully compatible with the OpenAI SDK.

Q: Are the free models good enough?

A: glm-4.7-flash is perfectly sufficient for daily conversation, simple programming, and content generation. For complex reasoning tasks, use paid models.

Q: Can GLM-4V understand handwritten Chinese?

A: Yes. GLM-4V excels at Chinese OCR, capable of recognizing handwritten Chinese, printed Chinese, and even complex tables.

Q: How much does video generation cost?

A: CogVideoX-2 is billed by usage. A 5-second video costs roughly a few RMB. New users get a free trial quota.

Next Steps

📝 Tutorial Version Notes: Based on ZhipuAI latest API version as of June 2026. All code tested and verified.

What This Tutorial Covers

Step 1: Meet the ZhipuAI Model Family

Language Models

Multimodal Models

Step 2: Get Your API Key

Step 3: Method 1 — Official New SDK (Recommended)

Install

Basic Chat

Streaming Output (Typewriter Effect)

Step 4: Thinking Mode (Deep Thinking)

Step 5: Function Calling (Building AI Agents)

Complete Example: Building an AI Weather Assistant

Step 6: Web Search

Step 7: Vision Understanding (GLM-4V Multimodal)

GLM-4V Unique Capabilities

Step 8: Method 2 — OpenAI-Compatible Interface

Step 9: Method 3 — Direct HTTP REST API

Step 10: Video Generation (CogVideoX-2)

Error Handling Best Practices

Deploying Open-Source Models (Local)

Using Transformers

Production Deployment with vLLM

Cost Optimization Strategies

Free Models First

Cost Reduction Tips

Pricing Quick Reference (June 2026)

Quick Comparison: GLM vs. DeepSeek vs. Qwen

FAQ

Q: Can I switch between Zhipu API and OpenAI API directly?

Q: Are the free models good enough?

Q: Can GLM-4V understand handwritten Chinese?

Q: How much does video generation cost?

Next Steps

相关教程