GLM-5: When Large Models Learn to 'Write Code' The Leap from Vibe Coding to Agentic Engineering

❝

🎯 Summary in One Sentence: Zhizhu AI, in collaboration with Tsinghua University, launched the 744B parameter GLM-5 model, which compresses attention computation using DeepSeek Sparse Attention (DSA), improves long task training efficiency with fully asynchronous reinforcement learning (Async RL), and implements a multi-stage post-training process, allowing large models to evolve from 'Vibe Coding' to 'Agentic Engineering', capable of independently completing real engineering projects.

Why is this Paper Needed?

Andrej Karpathy proposed an interesting concept in early 2025—Vibe Coding, meaning you just need to describe your requirements in natural language and let AI write code 'by feel'. This is indeed the mainstream experience of AI programming today: you say a sentence, and the model generates a piece of code, with the effectiveness largely depending on luck.

But the problem arises: real software engineering is far more complex than just 'writing code'. A true engineer needs to understand project architecture, debug errors, manage dependencies, and handle cross-module collaboration—none of which can be solved by 'one prompt, one piece of code'. The goal of the GLM-5 paper is to transform the model from a 'code-writing assistant' into an 'engineer capable of independently managing an entire project'.

This is no small goal. To achieve it, the Zhizhu team made significant innovations in model architecture, training processes, and reinforcement learning algorithms. This interpretation will break down these technical details for you.

Core Contributions: Three Key Innovations

Before diving into details, let's clarify the three core contributions of GLM-5:

Contribution Problem Solved Core Idea DSA Sparse Attention 128K long context computation cost explosion Dynamically selects important tokens, skipping irrelevant ones, saving 1.5-2 times computing power Asynchronous Reinforcement Learning Framework GPU idling during long task RL training Decouples generation and training completely, enabling pipeline parallelism Multi-Stage Post-Training Process Balancing multiple capabilities like reasoning, coding, and agent SFT → reasoning RL → agent RL → general RL, gradually adding capabilities

Model Architecture: Subtracting on the MoE Framework

Basic Configuration

GLM-5 adopts a Mixture-of-Experts (MoE) architecture, with a total of 744B parameters, but only about 40B parameters are activated during each inference. This 'large and sparse' design has become an industry consensus—DeepSeek-V3/R1 and Qwen3 have followed a similar route.

How Does DSA Work?

The core idea of DSA can be understood with a metaphor: imagine you are looking for information in a library. Standard attention is like flipping through every book in the entire library and then deciding which ones are useful. In contrast, DSA is more like an experienced librarian—it first uses a Lightning Index to quickly scan the titles on the shelves, locking onto a few potentially relevant areas, and then only reads specific paragraphs in those areas in detail.

Training Process: Four-Stage 'Leveling Up'

The training process of GLM-5 is the highlight of this paper, divided into pre-training and post-training stages.

Pre-Training Stage

Data Scale: 27T tokens, data mix includes web pages, code, academic papers, books, etc.
Context Expansion: Gradually expands context from 4K to 200K through mid-term training, using RoPE frequency adjustments.
Annealing Phase: Uses higher quality data for 'fine-tuning' at the end of pre-training.

Post-Training Quartet

This is the most distinctive part of GLM-5. GLM-5 underwent four rounds:

Supervised Fine-Tuning (SFT) fine-tunes with high-quality instruction data.
Reasoning Reinforcement Learning (Reasoning RL) conducts RL training on mathematical and code reasoning tasks.
Agentic Reinforcement Learning (Agentic RL), which is a key innovation.
General Reinforcement Learning (General RL) conducts RL on a broader range of general tasks.

Asynchronous Reinforcement Learning: Preventing GPU Idling

Traditional RL training is synchronous: collect a batch of data → calculate rewards → update the model → collect again. This works fine for short task times, but agent tasks often require dozens of interaction steps.

In-Depth Interpretation of Experimental Results

Main Benchmark Comparisons

Benchmark GLM-5 DeepSeek-V3.2 Claude Opus 4.5 Gemini 3 Pro GPT-5.2 MMLU-Pro 78.0 75.9 78.0 74.3 76.1 GPQA-Diamond 71.7 68.4 67.1 63.6 70.5 BrowseComp 57.1 32.0 26.3 25.1 46.9

Conclusion

The GLM-5 paper is rich in information. Setting aside specific numbers, the core message it conveys is: the next battlefield for large models is in 'doing work' rather than just 'answering questions'.

In terms of competition, GLM-5 demonstrates the competitiveness of Chinese AI teams in cutting-edge research on large models.

Paper Information

Title: GLM-5: from Vibe Coding to Agentic Engineering
Institution: Zhizhu AI & Tsinghua University
Link: https://arxiv.org/abs/2602.15763

GLM-5: When Large Models Learn to 'Write Code' The Leap from Vibe Coding to Agentic Engineering

GLM-5: When Large Models Learn to 'Write Code' The Leap from Vibe Coding to Agentic Engineering

Why is this Paper Needed?

Core Contributions: Three Key Innovations

Model Architecture: Subtracting on the MoE Framework

Basic Configuration

How Does DSA Work?

Training Process: Four-Stage 'Leveling Up'

Pre-Training Stage

Post-Training Quartet

Asynchronous Reinforcement Learning: Preventing GPU Idling

In-Depth Interpretation of Experimental Results

Main Benchmark Comparisons

Conclusion

Paper Information

You Might Also Like

Claude Code Buddy Modification Guide: How to Obtain Shiny Legendary Pets

Obsidian Launches Defuddle, Taking Obsidian Web Clipper to New Heights

OpenAI Suddenly Announces 'All-in-One': Browser + Programming + ChatGPT Merge, Internally Admits Mistakes Over the Past Year

2026, No More Forcing Myself to be 'Disciplined'! Do These 8 Simple Things, and Health Will Naturally Follow

Moms Who Work Hard to Lose Weight but Can't, Definitely Fall Here

AI Browser 24-Hour Stable Operation Guide