Claude Code vs Codex: I Watched a 38-Minute Test, the Difference is Greater Than Expected

2/25/2026
5 min read

Claude Code vs Codex: I Watched a 38-Minute Test, the Difference is Greater Than Expected

First, the conclusion: If you are an independent developer or need to quickly turn ideas into products, choose Claude Code. There’s no need to hesitate.

I usually use Claude Code, while Codex is only occasionally opened for testing. This preference is not just following trends; Claude Code updates too quickly, and founder Boris Churney often shares the team’s real development experiences using it on Twitter. It’s not a demo; it’s real stuff running in production environments.

As for Codex? Its capabilities are indeed strong; I have created a few small programs with it. Some in the community say that Codex is more suitable for backend development and security tasks.

Peter Steinberger, founder of Clawdbot, stated that he spent about 10 days vibe coding to create the Clawdbot prototype, mainly relying on Claude Code and Codex for development, with more reliance on Codex for complex coding and core parts.

So which of the two is more suitable as an AI programming tool? I was not sure before.

Until I saw this test.

Foreign blogger Mansel Scheffel conducted a hardcore experiment: giving both tools the exact same prompt to build an application from scratch and deploy it online. The entire process was recorded, lasting 38 minutes.

I. Experiment Setup: A Completely Fair Duel

The task is simple yet complete: build a competitive intelligence analysis application called "Rival."

Users input a company URL, and the application automatically fetches information about that company and its competitors, generating a complete competitive analysis report. If done by a consulting firm, it would cost at least $10,000.

Tech stack: Supabase (database + authentication) + Firecrawl (web scraping) + Vercel (deployment)

Rules: Exactly the same prompts, no additional hints, see who can complete it independently.

II. Round One: Planning Phase

Codex immediately asks you a dozen questions.

  • Who is the target user?
  • What model will be used for analysis?
  • Which authentication method to choose?
  • How to define the UI style?
  • What should the default usage limit be?
What about Claude Code? It didn’t ask a single question.

It just started writing code.

The blogger's evaluation is very accurate: "Codex is like a cautious intern, while Claude Code is like a confident veteran."

III. Round Two: Building Speed

Then comes the long wait.

  • Claude Code: Completed in about 1 hour
  • Codex: Over 2 hours and still ongoing
I noted down the blogger's exact words: "I have been sitting here for 2 hours and 34 minutes, most of the time waiting for Codex."

IV. Round Three: UI Quality Comparison

After both sides were deployed, the blogger opened the interfaces for comparison.

Claude Code’s interface: Not stunning, but usable. Reasonable layout, normal font.

Codex’s interface: The blogger complained on the spot—

"Honestly, this interface is too ugly. It’s 2026, how can it generate such fonts and spacing?"

V. Round Four: Functionality Testing

The real test came: let both sides analyze ClickUp.

Claude Code:

The first run encountered an error. But the fix was quick; it took a few minutes to locate the issue (JWT verification configuration) and was resolved within 4 minutes.

After fixing, it successfully fetched ClickUp and its competitors: Monday, Notion, Asana, Atlassian. The report was also generated.

Codex:

Encountered the same error.

It took 19 minutes to find the problem.

After fixing, it still didn’t work. The blogger waited a long time again and ultimately gave up.

VI. Round Five: Third-Party Review

The blogger invited Gemini Pro 3 to blind review both codebases. This segment was quite interesting.

In terms of backend security: Codex won.

Gemini believed its security architecture was more mature: complete row-level security policies (RLS), immutable audit logs, and authorization models were all better implemented. This also confirms the community's view—when it comes to backend and security, Codex indeed has a set.

In terms of frontend quality: Claude Code won decisively.

Code integrity, logical clarity, and UI implementation quality were all significantly better.

The blogger's summary was very direct:

"You can convince me that Codex is more secure, but you cannot convince me to use it. Because its user experience is too poor. A tool that cannot even implement basic functions, what’s the point of being secure?"

VII. Summary of Core Differences

After watching this test, my thoughts have changed somewhat.

Previously, I thought both tools had their pros and cons, and the choice depended on the scenario. Now I believe that if you are an independent developer or need to quickly validate ideas and build MVPs, Claude Code’s efficiency and reliability are superior. Time is money; while Codex is asking you its tenth question, Claude Code might already be up and running.

However, if you are working on enterprise-level backend with strict security requirements, Codex is worth considering. The premise is that you need to have patience.

References

  • YouTube Video: Claude Code vs Codex Head-to-Head by Mansel Scheffel (link)
  • Test Files: Google Drive - All Code and Configuration Files (link)
Published in Technology

You Might Also Like