Major Innovation in Agent Skills! Anthropic Upgrades Skill Factory with Nuclear-Level Evals System, Developers: Old Skills Revived

Intelligent Gorilla AI Organization | Editor: Xixi

In the field of AI agents, if you have used Agent Skills, you must be familiar with skill-creator, a no-code building tool for skills released by Anthropic in 2025.

However, after building a skill, there was still no way to know whether the skill was useful, whether the new model could still be used, whether it ran accurately, or how effective it was...

On March 3, Anthropic's official blog quietly released a significant update titled "Improving skill-creator: Test, measure, and refine Agent Skills." This upgrade has truly matured Claude's "skill factory."

From "seemingly usable" to "testable, measurable, and iterable," it has completely resolved the biggest pain point for skill authors, which is "how good is the skill I created?"

01 - Review of Agent Skills: A Key Step from General Assistant to Professional Agent

In October 2025, Anthropic officially launched Agent Skills, a modular and reusable "skill package" system. A folder contains SKILL.md instructions, scripts, and resources, which Claude automatically loads when needed, significantly enhancing performance in document generation, data analysis, brand compliance, and other scenarios.

Skills have covered the entire Claude.ai, Claude Code, and API platforms, and have opened a GitHub repository (currently with over 80,000 stars). However, the biggest limitation of the early version was that non-technical users could only iterate based on intuition, unable to quantify and verify effectiveness.

There are two types of Skills:

1. Capability Enhancement

Tasks that the model originally "could not do" or "could not do stably" are stabilized by injecting specific techniques and patterns through Skills.

2. Preference Encoding

The model can perform every step but needs to be strictly ordered according to the team's specific process.

Five Highlights of This Upgrade:

Evals (Automated Evaluation): Users only need to describe "test prompt + expected output" for skill-creator to automatically run verification.
Benchmark Mode: Batch run standardized tests, output pass rates, time taken, token consumption, and other hard metrics.
Multi-Agent Parallel Execution: Independent clean contexts to avoid contamination, significantly increasing testing speed.
Comparator (Blind Comparison): A/B testing of two skill versions.
Description Tuning (Trigger Description Optimization): Automatically analyzes sample prompts and suggests modifications to descriptions.

02 - No Reason Not to Install! This Update Revives Old Skills

Anthropic's update to skill-creator has quickly sparked discussions among AI agent practitioners and developers.

03 - The CI/CD Moment for AI Agents: From Artwork to Engineering Product

Anthropic's upgrade to skill-creator essentially brings the most mature "test-benchmark-iterate" closed loop from software engineering to ordinary users and enterprise teams with low barriers. This means that Agent Skills are no longer a one-time prompt project that is "written and thrown away," but a "living asset" that can be continuously maintained, compatible across model versions, and optimized with data.

In the short term, the biggest beneficiaries are developers and enterprise users who have accumulated a large number of custom skills in Claude Code / Cowork.

From a broader perspective, this update further solidifies Anthropic's "toolchain moat" in the Agent ecosystem.

Major Innovation in Agent Skills! Anthropic Upgrades Skill Factory with Nuclear-Level Evals System, Developers: Old Skills Revived

Major Innovation in Agent Skills! Anthropic Upgrades Skill Factory with Nuclear-Level Evals System, Developers: Old Skills Revived

01 - Review of Agent Skills: A Key Step from General Assistant to Professional Agent

There are two types of Skills:

1. Capability Enhancement

2. Preference Encoding

Five Highlights of This Upgrade:

02 - No Reason Not to Install! This Update Revives Old Skills

03 - The CI/CD Moment for AI Agents: From Artwork to Engineering Product

You Might Also Like

Claude Code Buddy Modification Guide: How to Obtain Shiny Legendary Pets

Obsidian Launches Defuddle, Taking Obsidian Web Clipper to New Heights

OpenAI Suddenly Announces 'All-in-One': Browser + Programming + ChatGPT Merge, Internally Admits Mistakes Over the Past Year

2026, No More Forcing Myself to be 'Disciplined'! Do These 8 Simple Things, and Health Will Naturally Follow

Moms Who Work Hard to Lose Weight but Can't, Definitely Fall Here

AI Browser 24-Hour Stable Operation Guide