OpenClaw + Claude Code Powerful Tutorial: Build a Complete Development Team by Yourself!

2/26/2026
14 min read

OpenClaw + Claude Code Powerful Tutorial: Build a Complete Development Team by Yourself!

Today, I am sharing an explosive practical case. (Tutorial attached at the end)

An independent developer built an AI Agent system using OpenClaw + Codex/CC. What results did he achieve?

AI Agent System Effect

94 submissions in one day, 7 PRs completed in 30 minutes, and on that day, he also held 3 client meetings without even opening the editor.

This really happened in January 2026. The author has made the entire system architecture, workflow, and code configuration public. After reading it, I felt this approach is worth learning, so I organized it into this article to share with you.

If you are also using Codex or Claude Code, or are interested in OpenClaw, this article will provide you with a lot of inspiration.

One Person, 94 Code Submissions in One Day

First, let’s look at some data to feel the power of this system:

  • Maximum of 94 submissions in a single day (average of 50 submissions per day)
  • 7 PRs completed in 30 minutes
  • The speed from idea to launch is so fast that it can "deliver customer requirements on the same day"
The author is using this system to create a real B2B SaaS product, combined with direct sales from the founder, most feature requests can be completed on the same day. How fast is it? When customers make requests, they can see results on the same day, directly converting to paying users.

What about the cost? $190 per month (Claude $100 + Codex $90), beginners can start for just $20.

You might ask: Did he just pile up a bunch of AI tools and generate garbage code like crazy?

No. The author's Git history looks like "just hired a development team," but in reality, it’s just him. The key change is: he shifted from "managing Claude Code" to "managing an AI butler, which then manages a group of Claude Codes."

  • Before January: Directly writing code with Codex or Claude Code
  • After January: Using OpenClaw as the orchestration layer to schedule Codex/Claude Code/Gemini
The effect of this transformation is that the system can automatically complete almost all small to medium complexity tasks without human intervention.

Why Is Using Codex and Claude Code Alone Not Enough?

At this point, you might think: Codex and Claude Code are already powerful, why add an orchestration layer?

The author’s answer is straightforward: Codex and Claude Code know almost nothing about your business. They only see code and do not understand the complete business picture.

There is a fundamental limitation here: the context window is fixed, and you can only choose one.

You must choose what to fill it with:

  • Fill it with code → No space for business context
  • Fill it with customer history → No space for the codebase
So when using Codex or Claude Code alone, you will encounter these problems:

  • It does not know which customer this feature is for
  • It does not know why the last similar request failed
  • It does not know your product positioning and design principles
  • It can only work based on the current code and your prompt
OpenClaw changes this equation.

It acts as the orchestration layer, sitting between you and all AI tools. Its roles are:

  • Holding all business context (customer data, meeting notes, historical decisions, success/failure cases)
  • Translating business context into precise prompts to feed to specific agents
  • Allowing these agents to focus on what they do best: writing code
To draw an analogy:

  • Codex/Claude Code = Professional chefs, only focused on cooking
  • OpenClaw = Head chef, knows customer preferences, ingredient inventory, menu positioning, and gives precise instructions to each chef
This is why a dual-layer system is needed: through the specialization of context, rather than switching to a stronger model.

Specific Architecture of the Dual-Layer System: Orchestration Layer + Execution Layer

Let’s take a look at the specific architecture of this system.Two-Tier System Architecture

Two tiers, each with its own responsibilities:

OpenClaw Architecture Diagram

What can OpenClaw (Orchestration Layer) do?

  • Read all meeting notes from Obsidian (automatically synchronized)
  • Access production database (read-only access) to obtain customer configurations
  • Have administrator API permissions to directly recharge customers and lift blocks
  • Select appropriate agents based on task type
  • Monitor the progress of all agents, analyze reasons for failures, and adjust prompts for retries
  • Notify the author via Telegram upon completion

What can Agent (Execution Layer) do?

  • Read and write to the codebase
  • Run tests and builds
  • Submit code and create PRs
  • Respond to feedback from code reviews
Key point: The Agent in the execution layer will never access the production database or see sensitive customer information. They only receive the "minimum context necessary to complete this task."

Security Boundary

This design is clever: the security boundary is clear while ensuring efficiency.

Complete Workflow: 8 Steps from Customer Requirement to PR Merge

Now we enter the core part. Using a real case from the author last week, let’s walk through the complete process.

Background: A corporate client called, expressing a desire to reuse their configured settings and share them within the team.

Step 1: Customer Requirement → OpenClaw Understands and Breaks It Down

After the call, the author discussed this requirement with Zoe (his OpenClaw).

The magic here: zero explanation cost. Since all meeting notes are automatically synchronized to Obsidian, Zoe has already read the call content and knows who the client is, their business scenario, and existing configurations.

The author and Zoe broke down the requirement into: creating a template system that allows users to save and edit existing configurations.

Then Zoe did three things:

  • Recharge the client — used the administrator API to immediately lift the client's usage restrictions
  • Pull customer configuration — retrieved the client's existing settings from the production database (read-only)
  • Generate prompt and start the agent — packaged all context and fed it to Codex

Step 2: Start the Agent

Zoe created for this task:

  • An independent git worktree (isolated branch environment)
  • A tmux session (to let the Agent run in the background)
# Create worktree + start agent git worktree add ../feat-custom-templates -b feat/custom-templates origin/main cd ../feat-custom-templates && pnpm install

tmux new-session -d -s "codex-templates" -c "/Users/elvis/Documents/GitHub/medialyst-worktrees/feat-custom-templates" "$HOME/.codex-agent/run-agent.sh templates gpt-5.3-codex high" Why use tmux? Because it allows for mid-course intervention.

If the AI goes off track, there’s no need to kill it and start over; simply send commands directly in tmux:

# Agent is going in the wrong direction tmux send-keys -t codex-templates "Stop for a moment. Do the API layer first, ignore the UI." Enter

Agent needs more context

tmux send-keys -t codex-templates "Type definitions are in src/types/template.ts, use that." Enter Meanwhile, the task will be recorded in a JSON file:{ "id": "feat-custom-templates", "tmuxSession": "codex-templates", "agent": "codex", "description": "Custom email template feature for enterprise clients", "repo": "medialyst", "worktree": "feat-custom-templates", "branch": "feat/custom-templates", "startedAt": 1740268800000, "status": "running", "notifyOnComplete": true}### Step 3: Automatic MonitoringAn automated cron job checks the status of all agents every 10 minutes.Key points: It does not "ask" the Agent how the progress is (that would consume a lot of tokens), but checks objective facts: - Is the tmux session still alive? - Has a PR been created? - What is the CI status? - If it failed, does it need to be restarted? (up to 3 retries)This monitoring script is 100% deterministic, very token-efficient, and only notifies the author when human intervention is needed.This is actually an improved version of the Ralph Loop, which will be detailed later.### Step 4: Agent Creates PRThe Agent writes the code, commits, pushes, and then creates a PR using gh pr create --fill.Note: At this point, the author will not receive a notification. Because a PR itself does not represent "completion".The definition of "completion" is: - ✅ PR has been created - ✅ Branch has been synced to main (no conflicts) - ✅ CI passed (lint, type checking, unit tests, E2E tests) - ✅ Codex reviewer approved - ✅ Claude Code reviewer approved - ✅ Gemini reviewer approved - ✅ If there are UI changes, screenshots must be includedOnly when all conditions are met is it considered truly complete.### Step 5: Automated Code ReviewEach PR will be reviewed by three Agents: - Codex Reviewer — The most reliable reviewer - Good at identifying edge cases - Can catch logical errors, missing error handling, race conditions - Very low false positive rate - Gemini Code Assist Reviewer — Free and easy to use - Can identify security issues and scalability problems missed by other reviewers - Provides specific fix suggestions - Worth using - Claude Code Reviewer — Basically useless - Overly cautious, always suggests "consider adding..." - Most suggestions are over-engineered - Unless marked as "critical", skip directlyAll three reviewers will comment directly in the PR.### Step 6: Automated TestingThe CI pipeline will run: - Lint and TypeScript checks - Unit tests - E2E tests - Playwright tests (run in a preview environment identical to production)A new rule added last week: If the PR changes the UI, a screenshot must be attached in the description; otherwise, CI will fail directly.This rule significantly shortens the review time — the author can see what has changed just by looking at the screenshot, without needing to enter the preview environment.### Step 7: Manual ReviewNow, the author receives a Telegram notification: "PR #341 is ready for review."At this point: - CI is all green - All three AI reviewers have approved - The screenshot shows the UI changes - All edge cases are recorded in the review commentsThe author's review only takes 5-10 minutes. For many PRs, he doesn't even look at the code, just looks at the screenshot and merges directly.### Step 8: MergeThe PR is merged. A daily cron job cleans up isolated worktrees and task records.The entire process, from customer requirements to code deployment, may only take 1-2 hours, while the author's actual input may only be 10 minutes.

Three Mechanisms to Make the System Smarter

Mechanism 1: Improved Ralph Loop — Not Just Repeating, But Learning

You may have heard of the Ralph Loop: Pull context from memory → Generate output → Evaluate results → Save learning.

However, most implementations have a problem: the prompt used in each loop is the same. What is learned improves future retrieval, but the prompt itself is static.

This system is different.

When the Agent fails, Zoe does not restart with the same prompt. She analyzes the failure reason with the complete business context and then rewrites the prompt:

❌ Bad Example (Static Prompt): { "Implement custom template feature" }

✅ Good Example (Dynamic Adjustment): { "Stop. The customer wants X, not Y. Here is their exact quote from the meeting: We want to preserve the existing configuration, not create a new one from scratch. Focus on configuration reuse, do not create a new process." }Zoe can make such adjustments because she has context that the execution layer Agent does not have:

  • What the customer said in the meeting
  • What the company does
  • Why the last similar request failed
Furthermore, Zoe does not wait for you to assign tasks; she actively seeks work:

  • Morning: Scans Sentry → Discovers 4 new errors → Starts 4 Agents to investigate and fix
  • After the meeting: Scans meeting notes → Discovers 3 feature requests mentioned by customers → Starts 3 Codex
  • Evening: Scans git log → Starts Claude Code to update changelog and customer documentation
The author returns from a walk to find a message on Telegram: "7 PRs are ready. 3 new features, 4 bug fixes."

Successful patterns will be recorded:

  • "This prompt structure is very effective for the billing feature"
  • "Codex needs to receive type definitions in advance"
  • "Always include the test file path"
The reward signals are: CI passes, three code reviews pass, manual merges. Any failure will trigger the loop.

The longer the time, the better the prompts written by Zoe, because she remembers what can succeed.

Mechanism 2: Agent Selection Strategy — Different Tasks Require Different Experts

Not all Agents are equally strong. The author's summarized selection strategy:

  • Codex(gpt-5.3-codex) — Main force- Backend logic, complex bugs, multi-file refactoring, tasks requiring cross-codebase reasoning
  • Slow but thorough
  • Accounts for 90% of tasks

  • Claude Code(claude-opus-4.5) — Speed-oriented player- Frontend work
  • Fewer permission issues, suitable for git operations
  • (The author used to use it more often, but switched after Codex 5.3 came out)

  • Gemini — Designer- Has design aesthetics
  • For beautiful UIs, let Gemini generate HTML/CSS specifications first, then hand it over to Claude Code for implementation in the component system
  • Gemini designs, Claude builds
Zoe automatically selects Agents based on task type and passes outputs between them. Billing system bugs go to Codex, button style fixes go to Claude Code, and new dashboard designs go to Gemini first.

Mechanism 3: Where is the Bottleneck? RAM

There is an unexpected limitation: it is not token cost, nor API rate, but memory.

Each Agent requires:

  • Its own worktree
  • Its own node_modules
  • Running builds, type checks, tests
5 Agents running simultaneously = 5 parallel TypeScript compilers + 5 test runners + 5 sets of dependencies loaded into memory.

The author's Mac Mini (16GB RAM) can run a maximum of 4-5 Agents at the same time; any more starts swapping, and one must pray that they do not build simultaneously.So he bought a Mac Studio M4 Max (128GB RAM, $3500), which will arrive at the end of March. He said he would share whether it is worth it at that time.

You Can Also Build: From Zero to Running in Just 10 Minutes

Want to try this system?

The simplest way:

Copy this entire article to OpenClaw and tell it: "Implement an Agent cluster system for my codebase according to this architecture."

Then, it will:

  • Read the architectural design
  • Create scripts
  • Set up the directory structure
  • Configure cron monitoring
Done in 10 minutes.

You need to prepare:

  • OpenClaw account
  • API access to Codex and/or Claude Code
  • A git repository
  • (Optional) Obsidian for storing business context

2026: A Million-Dollar Company for One Person

The author said something at the end that I found very inspiring:

"We will see a large number of million-dollar companies run by one person starting in 2026. The leverage is enormous, belonging to those who understand how to build recursively self-improving AI systems."

This is what it looks like:

  • An AI orchestrator as your extension (like Zoe for the author)
  • Delegating work to specialized Agents handling different business functions
  • Engineering, customer support, operations, marketing
  • Each Agent focuses on what it excels at
  • You maintain focus and complete control
The next generation of entrepreneurs will not hire 10 people to do what one person plus a system can do. They will build this way—stay small, act fast, and release daily.

There is too much garbage content generated by AI now. Various hype, all sorts of fancy demos of "task control centers," but nothing truly useful.

The author said he wants to do the opposite: less hype, more documentation of the real building process. Real customers, real revenue, real submissions to production environments, and real failures.

This article ends here.

Core points recap:

  • Dual-layer architecture: orchestration layer holds business context, execution layer focuses on code
  • Complete automation: an 8-step process from requirements to PR, most tasks succeed on the first try
  • Dynamic learning: not repetitive execution, but adjusting strategies based on reasons for failure
  • Cost control: starting at $20/month, heavy usage at $190/month
If you are also exploring practical applications of AI automation, I hope this case can inspire you.

Reference address:Content: https://x.com/elvissun/status/2025920521871716562

Published in Technology

You Might Also Like