Now in beta — get early access

One goal. Autonomous
execution.

Submit a development objective. CodeHelm decomposes it into a structured task plan, dispatches specialized AI agents to work in parallel, validates the results against quality gates and acceptance criteria, and iterates autonomously — until the work is done.

No credit card required · 14-day Pro trial · BYOK / BYOS

What is CodeHelm?

The execution engine between your spec and validated code

CodeHelm is not a wrapper around Claude or Codex. It is an orchestration runtime that takes a development objective, breaks it into a structured task plan, and coordinates multiple specialized AI agents to carry out the work — with quality gates and an LLM critic evaluating every result before anything is accepted.

When results fail validation, CodeHelm re-plans and retries automatically. When runs succeed, the changes are merged and a handoff summary is generated. The whole process is auditable, observable, and team-safe from day one.

Execution flow
Objective / Spec
one input
Parse + Plan
LLM-structured task DAG
Agent Dispatch
Cursor · Codex · Claude · GPT-4o
Quality Gates + Critic
syntax · tests · build · acceptance criteria
Retry if needed
automatic re-plan and re-execute
Merge + Handoff
validated output
Capabilities

Not a wrapper. An execution engine.

Every part of CodeHelm is designed around autonomous, validated, multi-agent software execution — not just API key management.

Autonomous Decomposition

CodeHelm parses your objective with an LLM, extracts requirements and acceptance criteria, then generates a structured task DAG — before a single agent runs.

Multi-Agent Dispatch

Tasks are assigned to specialized agents: Cursor for frontend work, Codex for backend logic, Claude Code for architecture and refactoring. Each runs in an isolated git worktree.

Plugin Skills

Activate reusable plugins like Ottili Frontend Design to bias every run with domain-specific instructions. Pro workspaces can create custom plugins for their own workflows.

Quality Gates

Every completed task passes through configurable validation checkpoints: syntax checks, lint, test suites, build steps, and health checks — before results are accepted.

LLM Critic Evaluation

An LLM evaluator reviews each result against the original acceptance criteria. If it doesn't meet the bar, CodeHelm re-plans the task and retries with an adjusted approach.

Automatic Retry & Refinement

Failed validation triggers a re-plan loop, not a dead end. CodeHelm adjusts the approach and dispatches again — up to configurable iteration limits.

Continuous Improvement

Between major runs, the Micro Evolution Engine scans for improvement signals: TODO/FIXME comments, failed tests, log errors, and health alerts — and resolves them conservatively.

Isolated Execution & Merge

Agents work in separate git worktrees so changes never conflict mid-run. On validation success, the Orchestrator merges results and runs a final gate pass on the combined output.

Full Lifecycle Observability

Every run moves through a structured state machine with 13 states. Every transition is logged, timestamped, and auditable. Prometheus metrics and webhook callbacks for every event.

Team-Safe Orchestration

Multi-workspace isolation with RBAC (owner, admin, member, viewer). AI credentials are workspace-scoped, encrypted at rest, and never proxied through CodeHelm's infrastructure.

Execution model

What happens during a CodeHelm run

From objective to validated output — every step is automated, observable, and recoverable.

01

Submit objective

Provide a development goal, spec document, or task description. CodeHelm handles the rest.

02

Plan & decompose

An LLM parses your objective, identifies requirements, and generates a task dependency graph with agent assignments.

03

Agents execute

Specialized agents (Cursor, Codex, Claude Code) work on assigned tasks in isolated git worktrees simultaneously.

04

Validate & merge

Quality gates and an LLM critic evaluate every result. Failures trigger automatic re-planning. Successful output is merged and delivered.

Why CodeHelm

Orchestrated delivery, not one-shot drafts

Single-run AI tools are fast. CodeHelm is thorough. These are different products for different problems.

Single-run AI coding tools
One prompt → one agent → one pass
No structured planning or decomposition
Short execution window (seconds to minutes)
No built-in validation or acceptance testing
Manual retry when output falls short
No persistent run history or audit trail
Context limited to a single conversation
CodeHelm
One objective → structured plan → coordinated agents
LLM-powered decomposition into a task DAG
Long-running execution with configurable iteration limits
Quality gates (lint, tests, build) + LLM critic evaluation
Automatic re-plan and retry when output doesn't meet criteria
Append-only audit trail for every state transition and action
Repo context built from actual file system and git state
Bring Your Own Keys

You pay for orchestration.
Not for tokens.

CodeHelm does not proxy your AI requests. Your API keys go directly to your chosen provider. We charge for the execution layer — the planning, agent coordination, validation, retry logic, and observability infrastructure that makes autonomous AI development reliable and team-safe.

Bring Your Own Keys (BYOK)
Add your OpenAI, Anthropic, or Google API keys. Encrypted at rest with AES-256, used only for your workspace's runs.
Bring Your Own Subscription (BYOS)
Already paying for Cursor, Codex, or Claude Code? Connect those subscriptions directly. CodeHelm orchestrates — your subscription provides the compute.
No token markup, ever
We never see your prompts or responses. Zero markup on AI tokens. Your provider relationship stays yours.
Provider configuration
Anthropic Claude
BYOKsk-ant-••••••••4f2a
OpenAI GPT-4o
BYOKsk-••••••••9d1c
Cursor Pro
BYOScursor://••••••••
Google Gemini
BYOKAIza••••••••
Pricing

Simple, transparent pricing

Pay for orchestration infrastructure. Your AI tokens stay between you and your provider.

Plus

For individual engineers and small projects

€19/mo

or €190/year — save 16%

  • 1 workspace
  • Up to 3 repositories
  • Up to 5 AI providers
  • Plugin library access
  • Unlimited runs
  • 7-day log retention
  • Community support
  • Custom plugin creation
  • Team members
  • RBAC & roles
  • Webhook callbacks
Most popular

Pro

For teams and growing engineering organizations

€190/mo

or €1,900/year — save 16%

  • 10 workspaces
  • Unlimited repositories
  • Unlimited AI providers
  • Plugin library access
  • Custom plugin creation
  • Unlimited runs
  • 90-day log retention
  • Team members & RBAC
  • Webhook callbacks
  • Priority run queue
  • Email support

Enterprise

Custom scale, SLA, and dedicated support

Custom
  • Unlimited workspaces
  • Unlimited everything
  • Unlimited custom plugins
  • Custom log retention
  • SSO / SAML
  • Custom SLA
  • Dedicated CSM
  • Audit log export
  • On-premises option
  • Custom integrations
Security & Auditability

Autonomous execution requires serious auditability

When AI agents are making code changes on your behalf, you need a complete record of everything that happened and why.

Append-only audit log

Every state transition, agent dispatch, validation result, and retry is logged and immutable. Full lifecycle history per run.

Encrypted credentials

All AI provider keys and subscriptions are encrypted at rest with AES-256. Never proxied through our infrastructure.

Workspace RBAC

Fine-grained roles — owner, admin, member, viewer — scoped per workspace. Agent credentials and run permissions are isolated.

SOC2-ready design

Designed with SOC2 Type II compliance principles from day one. Structured logs, controlled access, and no AI token logging.

Get started today

Give CodeHelm the goal.
Track every step to completion.

Connect your repositories, configure your AI providers, and submit your first development objective. CodeHelm handles the decomposition, execution, and validation.

No credit card required · Free 14-day trial on Pro