Now in beta — get early access

One goal. Autonomous
execution.

Submit a development objective. CodeHelm decomposes it into a structured task plan, dispatches specialized AI agents to work in parallel, validates the results against quality gates and acceptance criteria, and iterates autonomously — until the work is done.

No credit card required · 14-day Pro trial · BYOK / BYOS

codehelm / job-247 / running

Objective

Refactor auth module to use JWT + refresh token rotation

Decomposed — 4 agents dispatched

parse-spec + build-context

Claude

refactor-auth-core

Codex

update-token-endpoints

Cursor

add-rotation-tests

Codex

critic-eval + quality-gates

Claude

Gates: syntax ✓ · lint ✓ · tests pending

↻ Iteration 2 of 3

✓ 3 / 5 tasks validated

What is CodeHelm?

The execution engine between your spec and validated code

CodeHelm is not a wrapper around Claude or Codex. It is an orchestration runtime that takes a development objective, breaks it into a structured task plan, and coordinates multiple specialized AI agents to carry out the work — with quality gates and an LLM critic evaluating every result before anything is accepted.

When results fail validation, CodeHelm re-plans and retries automatically. When runs succeed, the changes are merged and a handoff summary is generated. The whole process is auditable, observable, and team-safe from day one.

Execution flow

Objective / Spec

one input

Parse + Plan

LLM-structured task DAG

Agent Dispatch

Cursor · Codex · Claude · GPT-4o

Quality Gates + Critic

syntax · tests · build · acceptance criteria

Retry if needed

automatic re-plan and re-execute

Merge + Handoff

validated output

Capabilities

Not a wrapper. An execution engine.

Every part of CodeHelm is designed around autonomous, validated, multi-agent software execution — not just API key management.

Autonomous Decomposition

CodeHelm parses your objective with an LLM, extracts requirements and acceptance criteria, then generates a structured task DAG — before a single agent runs.

Multi-Agent Dispatch

Tasks are assigned to specialized agents: Cursor for frontend work, Codex for backend logic, Claude Code for architecture and refactoring. Each runs in an isolated git worktree.

Plugin Skills

Activate reusable plugins like Ottili Frontend Design to bias every run with domain-specific instructions. Pro workspaces can create custom plugins for their own workflows.

Quality Gates

Every completed task passes through configurable validation checkpoints: syntax checks, lint, test suites, build steps, and health checks — before results are accepted.

LLM Critic Evaluation

An LLM evaluator reviews each result against the original acceptance criteria. If it doesn't meet the bar, CodeHelm re-plans the task and retries with an adjusted approach.

Automatic Retry & Refinement

Failed validation triggers a re-plan loop, not a dead end. CodeHelm adjusts the approach and dispatches again — up to configurable iteration limits.

Continuous Improvement

Between major runs, the Micro Evolution Engine scans for improvement signals: TODO/FIXME comments, failed tests, log errors, and health alerts — and resolves them conservatively.

Isolated Execution & Merge

Agents work in separate git worktrees so changes never conflict mid-run. On validation success, the Orchestrator merges results and runs a final gate pass on the combined output.

Full Lifecycle Observability

Every run moves through a structured state machine with 13 states. Every transition is logged, timestamped, and auditable. Prometheus metrics and webhook callbacks for every event.

Team-Safe Orchestration

Multi-workspace isolation with RBAC (owner, admin, member, viewer). AI credentials are workspace-scoped, encrypted at rest, and never proxied through CodeHelm's infrastructure.

Execution model

What happens during a CodeHelm run

From objective to validated output — every step is automated, observable, and recoverable.

Submit objective

Provide a development goal, spec document, or task description. CodeHelm handles the rest.

Plan & decompose

An LLM parses your objective, identifies requirements, and generates a task dependency graph with agent assignments.

Agents execute

Specialized agents (Cursor, Codex, Claude Code) work on assigned tasks in isolated git worktrees simultaneously.

Validate & merge

Quality gates and an LLM critic evaluate every result. Failures trigger automatic re-planning. Successful output is merged and delivered.

Why CodeHelm

Orchestrated delivery, not one-shot drafts

Single-run AI tools are fast. CodeHelm is thorough. These are different products for different problems.

Single-run AI coding tools

One prompt → one agent → one pass

No structured planning or decomposition

Short execution window (seconds to minutes)

No built-in validation or acceptance testing

Manual retry when output falls short

No persistent run history or audit trail

Context limited to a single conversation

CodeHelm

One objective → structured plan → coordinated agents

LLM-powered decomposition into a task DAG

Long-running execution with configurable iteration limits

Quality gates (lint, tests, build) + LLM critic evaluation

Automatic re-plan and retry when output doesn't meet criteria

Append-only audit trail for every state transition and action

Repo context built from actual file system and git state

Bring Your Own Keys

You pay for orchestration.
Not for tokens.

CodeHelm does not proxy your AI requests. Your API keys go directly to your chosen provider. We charge for the execution layer — the planning, agent coordination, validation, retry logic, and observability infrastructure that makes autonomous AI development reliable and team-safe.

Bring Your Own Keys (BYOK)

Add your OpenAI, Anthropic, or Google API keys. Encrypted at rest with AES-256, used only for your workspace's runs.

Bring Your Own Subscription (BYOS)

Already paying for Cursor, Codex, or Claude Code? Connect those subscriptions directly. CodeHelm orchestrates — your subscription provides the compute.

No token markup, ever

We never see your prompts or responses. Zero markup on AI tokens. Your provider relationship stays yours.

Provider configuration

Anthropic Claude

BYOKsk-ant-••••••••4f2a

OpenAI GPT-4o

BYOKsk-••••••••9d1c

Cursor Pro

BYOScursor://••••••••

Google Gemini

BYOKAIza••••••••

Pricing

Simple, transparent pricing

Pay for orchestration infrastructure. Your AI tokens stay between you and your provider.

Plus

For individual engineers and small projects

€19/mo

or €190/year — save 16%

1 workspace
Up to 3 repositories
Up to 5 AI providers
Plugin library access
Unlimited runs
7-day log retention
Community support
Custom plugin creation
Team members
RBAC & roles
Webhook callbacks

Pro

For teams and growing engineering organizations

€190/mo

or €1,900/year — save 16%

10 workspaces
Unlimited repositories
Unlimited AI providers
Plugin library access
Custom plugin creation
Unlimited runs
90-day log retention
Team members & RBAC
Webhook callbacks
Priority run queue
Email support

Enterprise

Custom scale, SLA, and dedicated support

Custom

Unlimited workspaces
Unlimited everything
Unlimited custom plugins
Custom log retention
SSO / SAML
Custom SLA
Dedicated CSM
Audit log export
On-premises option
Custom integrations

Security & Auditability

Autonomous execution requires serious auditability

When AI agents are making code changes on your behalf, you need a complete record of everything that happened and why.

Append-only audit log

Every state transition, agent dispatch, validation result, and retry is logged and immutable. Full lifecycle history per run.

Encrypted credentials

All AI provider keys and subscriptions are encrypted at rest with AES-256. Never proxied through our infrastructure.

Workspace RBAC

Fine-grained roles — owner, admin, member, viewer — scoped per workspace. Agent credentials and run permissions are isolated.

SOC2-ready design

Designed with SOC2 Type II compliance principles from day one. Structured logs, controlled access, and no AI token logging.

Get started today

Give CodeHelm the goal.
Track every step to completion.

Connect your repositories, configure your AI providers, and submit your first development objective. CodeHelm handles the decomposition, execution, and validation.

No credit card required · Free 14-day trial on Pro

One goal. Autonomousexecution.