Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Another week in the summer of 2025 has begun, and in a continuation of the trend from last week, with it arrives more powerful Chinese open source AI models.
Little-known (at least to us here in the West) Chinese startup Z.ai has introduced two new open source LLMs — GLM-4.5 and GLM-4.5-Air — casting them as go-to solutions for AI reasoning, agentic behavior, and coding.
And according to Z.ai’s blog post, the models perform near the top of the pack of other proprietary LLM leaders in the U.S.
For example, the flagship GLM-4.5 matches or outperforms leading proprietary models like Claude 4 Sonnet, Claude 4 Opus, and Gemini 2.5 Pro on evaluations such as BrowseComp, AIME24, and SWE-bench Verified, while ranking third overall across a dozen competitive tests.
The AI Impact Series Returns to San Francisco – August 5
The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.
Secure your spot now – space is limited: https://bit.ly/3GuuPLF
Its lighter-weight sibling, GLM-4.5-Air, also performs within the top six, offering strong results relative to its smaller scale.
Both models feature dual operation modes: a thinking mode for complex reasoning and tool use, and a non-thinking mode for instant response scenarios. They can automatically generate complete PowerPoint presentations from a single title or prompt, making them useful for meeting preparation, education, and internal reporting.
They further offer creative writing, emotionally aware copywriting, and script generation to create branded content for social media and the web. Moreover, z.ai says they support virtual character development and turn-based dialogue systems for customer support, roleplaying, fan engagement, or digital persona storytelling.
While both models support reasoning, coding, and agentic capabilities, GLM-4.5-Air is designed for teams seeking a lighter-weight, more cost-efficient alternative with faster inference and lower resource requirements.
Z.ai also lists several specialized models in the GLM-4.5 family on its API, including GLM-4.5-X and GLM-4.5-AirX for ultra-fast inference, and GLM-4.5-Flash, a free variant optimized for coding and reasoning tasks.
They’re available now to use directly on Z.ai and through the Z.ai application programming interface (API) for developers to connect to third-party apps, and their code is available on HuggingFace and ModelScope. The company also provides multiple integration routes, including support for inference via vLLM and SGLang.
Licensing and API pricing
GLM-4.5 and GLM-4.5-Air are released under the Apache 2.0 license, a permissive and commercially friendly open-source license.
This allows developers and organizations to freely use, modify, self-host, fine-tune, and redistribute the models for both research and commercial purposes.
For those who don’t want to download the model code or weights and self-host or deploy on their own, z.ai’s cloud-based API offers the model for the following prices.
- GLM-4.5:
- $0.60 / $2.20 per 1 million input/output tokens
- GLM-4.5-Air:
- $0.20 / $1.10 per 1M input/output tokens
A CNBC article on the models reported that z.ai would charge only $0.11 / $0.28 per million input/output tokens, which is also supported by a Chinese graphic the company posted on its API documentation for the “Air model.”
However, this appears to be the case only for inputting up to 32,000 tokens and outputting 200 tokens at a single time. (Recall tokens are the numerical designations the LLM uses to represent different semantic concepts and word components, the LLM’s native language, with each token translating to a word or portion of a word).
In fact, the Chinese graphic reveals far more detailed pricing for both models per batches of tokens inputted/outputted. I’ve tried to translate it below:

Another note: since z.ai is based in China, those in the West who are focused on data sovereignty will want to due diligence through internal policies to pursue using the API, as it may be subject to Chinese content restrictions.
Competitive performance on third-party benchmarks, approaching that of leading closed/proprietary LLMs

GLM-4.5 ranks third across 12 industry benchmarks measuring agentic, reasoning, and coding performance—trailing only OpenAI’s GPT-4 and xAI’s Grok 4. GLM-4.5-Air, its more compact sibling, lands in sixth position.
In agentic evaluations, GLM-4.5 matches Claude 4 Sonnet in performance and exceeds Claude 4 Opus in web-based tasks. It achieves a 26.4% accuracy on the BrowseComp benchmark, compared to Claude 4 Opus’s 18.8%. In the reasoning category, it scores competitively on tasks such as MATH 500 (98.2%), AIME24 (91.0%), and GPQA (79.1%).
For coding, GLM-4.5 posts a 64.2% success rate on SWE-bench Verified and 37.5% on Terminal-Bench. In pairwise comparisons, it outperforms Qwen3-Coder with an 80.8% win rate and beats Kimi K2 in 53.9% of tasks. Its agentic coding ability is enhanced by integration with tools like Claude Code, Roo Code, and CodeGeex.
The model also leads in tool-calling reliability, with a success rate of 90.6%, edging out Claude 4 Sonnet and the new-ish Kimi K2.
Part of the wave of open source Chinese LLMs
The release of GLM-4.5 arrives amid a surge of competitive open-source model launches in China, most notably from Alibaba’s Qwen Team.
In the span of a single week, Qwen released four new open-source LLMs, including the reasoning-focused Qwen3-235B-A22B-Thinking-2507, which now tops or matches leading models such as OpenAI’s o4-mini and Google’s Gemini 2.5 Pro on reasoning benchmarks like AIME25, LiveCodeBench, and GPQA.
This week, Alibaba continued the trend with the release of Wan 2.2, a powerful new open source video model.
Alibaba’s new models are, like z.ai, licensed under Apache 2.0, allowing commercial usage, self-hosting, and integration into proprietary systems.
The broad availability and permissive licensing of Alibaba’s offerings and Chinese startup Moonshot before it with its Kimi K2 model reflects an ongoing strategic effort by Chinese AI companies to position open-source infrastructure as a viable alternative to closed U.S.-based models.
It also places pressure on the U.S.-based model provider efforts to compete in open source. Meta has been on a hiring spree after its Llama 4 model family debuted earlier this year to a mixed response from the AI community, including a hefty dose of criticism for what some AI power users saw as benchmark gaming and inconsistent performance.
Meanwhile, OpenAI co-founder and CEO Sam Altman recently announced that OpenAI’s long-awaited and much-hyped frontier open source LLM — its first since before ChatGPT launched in late 2022 — would be delayed from its originally planned July release to an as-yet unspecified later date.
Architecture and training lessons revealed
GLM-4.5 is built with 355 billion total and 32 billion active parameters. Its counterpart, GLM-4.5-Air, offers a lighter-weight design at 106 billion total and 12 billion active parameters.
Both use a Mixture-of-Experts (MoE) architecture, optimized with loss-free balance routing, sigmoid gating, and increased depth for enhanced reasoning.
The self-attention block includes Grouped-Query Attention and a higher number of attention heads. A Multi-Token Prediction (MTP) layer enables speculative decoding during inference.
Pre-training spans 22 trillion tokens split between general-purpose and code/reasoning corpora. Mid-training adds 1.1 trillion tokens from repo-level code data, synthetic reasoning inputs, and long-context/agentic sources.
Z.ai’s post-training process for GLM-4.5 relied upon a reinforcement learning phase powered by its in-house RL infrastructure, slime, which separates data generation and model training processes to optimize throughput on agentic tasks.
Among the techniques they used were mixed-precision rollouts and adaptive curriculum learning.
The former help the model train faster and more efficiently by using lower-precision math when generating data, without sacrificing much accuracy.
Meanwhile, adaptive curriculum learning means the model starts with easier tasks and gradually moves to harder ones, helping it learn more complex tasks gradually over time.
GLM-4.5’s architecture prioritizes computational efficiency. According to CNBC, Z.ai CEO Zhang Peng stated that the model runs on just eight Nvidia H20 GPUs — custom silicon designed for the Chinese market to comply with U.S. export controls. That’s roughly half the hardware requirement of DeepSeek’s comparable models.
Interactive demos
Z.ai highlights full-stack development, slide creation, and interactive artifact generation as demonstration areas on its blog post.
Examples include a Flappy Bird clone, Pokémon Pokédex web app, and slide decks built from structured documents or web queries.

Users can interact with these features on the Z.ai chat platform or through API integration.
Company background and market position
Z.ai was founded in 2019 under the name Zhipu, and has since grown into one of China’s most prominent AI startups, according to CNBC.
The company has raised over $1.5 billion from investors including Alibaba, Tencent, Qiming Venture Partners, and municipal funds from Hangzhou and Chengdu, with additional backing from Aramco-linked Prosperity7 Ventures.
Its GLM-4.5 launch coincides with the World Artificial Intelligence Conference in Shanghai, where multiple Chinese firms showcased advancements. Z.ai was also named in a June OpenAI report highlighting Chinese progress in AI, and has since been added to a U.S. entity list limiting business with American firms.
What it means for enterprise technical decision-makers
For senior AI engineers, data engineers, and AI orchestration leads tasked with building, deploying, or scaling language models in production, the GLM-4.5 family’s release under the Apache 2.0 license presents a meaningful shift in options.
The model offers performance that rivals top proprietary systems across reasoning, coding, and agentic benchmarks — yet comes with full weight access, commercial usage rights, and flexible deployment paths, including cloud, private, or on-prem environments.
For those managing LLM lifecycles — whether leading model fine-tuning, orchestrating multi-stage pipelines, or integrating models with internal tools — GLM-4.5 and GLM-4.5-Air reduce barriers to testing and scaling.
The models support standard OpenAI-style interfaces and tool-calling formats, making it easier to evaluate in sandboxed environments or drop into existing agent frameworks.
GLM-4.5 also supports streaming output, context caching, and structured JSON responses, enabling smoother integration with enterprise systems and real-time interfaces. For teams building autonomous tools, its deep thinking mode provides more precise control over multi-step reasoning behavior.
For teams under budget constraints or those seeking to avoid vendor lock-in, the pricing structure undercuts major proprietary alternatives like DeepSeek and Kimi K2. This matters for organizations where usage volume, long-context tasks, or data sensitivity make open deployment a strategic necessity.
For professionals in AI infrastructure and orchestration, such as those implementing CI/CD pipelines, monitoring models in production, or managing GPU clusters, GLM-4.5’s support for vLLM, SGLang, and mixed-precision inference aligns with current best practices in efficient, scalable model serving. Combined with open-source RL infrastructure (slime) and a modular training stack, the model’s design offers flexibility for tuning or extending in domain-specific environments.
In short, GLM-4.5’s launch gives enterprise teams a viable, high-performing foundation model they can control, adapt, and scale, without being tied to proprietary APIs or pricing structures. It’s a compelling option for teams balancing innovation, performance, and operational constraints.
Source link