How it works

From objective to validated output

CodeHelm does not just trigger an AI tool and hand you the result. It plans the work, coordinates agents, validates every step, and iterates until the output meets your acceptance criteria.

Connect your repositories

Install the CodeHelm GitHub App on your account or organization. Select the repositories you want to give CodeHelm access to. The App uses short-lived installation tokens — not personal access tokens — so there is no long-lived secret to manage.

Works with personal accounts, organizations, and enterprise GitHub
Supports private repos, monorepos, and multi-org setups
Revoke access at any time from GitHub's App settings

GitHub App installation

myorg/api-serviceConnected

myorg/frontendConnected

myorg/data-pipelineConnected

Configure AI providers

Add your AI provider credentials as API keys (BYOK) or existing subscriptions (BYOS). CodeHelm supports OpenAI, Anthropic, Google, Cursor, Codex, and Claude Code. Credentials are encrypted at rest and never proxied through CodeHelm's infrastructure — agents call your provider directly.

AES-256 encryption at rest. Never logged or exposed in run output.
Multiple providers per workspace. Choose per run or let CodeHelm route.
CodeHelm charges for orchestration. You pay your provider separately.

Provider configuration

Claude SonnetBYOK

OpenAI GPT-4oBYOK

Cursor ProBYOS

Codex CLIBYOS

Submit a development objective

Provide CodeHelm with a development goal — a natural language description, a spec document, a feature request, or a bug report. You do not need to break this down yourself. CodeHelm parses the objective with an LLM, extracts requirements and acceptance criteria, and builds a structured understanding of the work before any agent runs.

Accepts natural language, spec documents, or structured task descriptions
LLM extracts requirements, acceptance criteria, and unknowns
Repo context is built from the actual file system, services, and git state

New run — objective

Refactor the authentication module to support JWT access tokens with refresh token rotation (RTR). Revoke all tokens on password change. Add integration tests for the rotation flow.

Parsing requirements...

Plan and decompose

CodeHelm generates a structured task plan — a dependency graph where each node is an assignable unit of work with an agent assignment, acceptance criteria, and quality gate configuration. Complex objectives are split into parallel and sequential tasks. The plan accounts for the actual state of your repository, not just the prompt.

Task dependency graph (DAG) built from requirements and repo context
Each task is assigned to the most suitable agent type
Parallel tasks are dispatched simultaneously to reduce total execution time

Generated task plan

01 parse-spec + build-context

Claude

02 refactor-auth-core

→ 01Codex

03 update-token-endpoints

→ 01Cursor

04 add-rotation-tests

→ 02,03Codex

05 critic-eval + quality-gates

→ 04Claude

Agents execute in parallel

Each task is dispatched to its assigned agent — Cursor CLI for frontend and UI work, Codex for backend logic and heavy reasoning, Claude Code for architecture, refactoring, and multi-file changes. Every agent works in an isolated git worktree so parallel changes never conflict mid-run. If a primary agent fails, CodeHelm automatically falls back through the configured fallback chain.

Agents work in isolated git worktrees — no mid-run conflicts
Fallback chain: if primary fails, CodeHelm retries with an alternative agent
Each agent receives repo context, acceptance criteria, and task constraints

Active dispatch — 3 agents running

refactor-auth-core

Codexrunning

update-token-endpoints

Cursorrunning

add-rotation-tests

Codexqueued

Quality gates and LLM critic

When a task completes, it passes through a validation pipeline. Quality gates run automatically: syntax checks, lint, test suites, build steps, and health checks. Then an LLM critic evaluates the output against the original acceptance criteria. If either check fails, CodeHelm re-plans the task with an adjusted approach and dispatches again — automatically, without human intervention.

Configurable gates: syntax, lint, tests, build, custom shell commands
LLM critic evaluates output against the original acceptance criteria
Failures trigger automatic re-plan and retry — not a dead end

Task validation — refactor-auth-core

syntax checkpass

flake8 lintpass

pytest auth testspass

LLM critic — acceptance criteriapass

All gates passed — task accepted

Merge, validate, deliver

When all tasks have passed validation, CodeHelm merges the results from individual agent worktrees into the target branch and runs a final gate pass on the combined output. Once the final validation succeeds, a handoff summary is generated with a description of what was done, what was changed, and what was tested. The run history is preserved in the audit log.

Final quality gate run on the merged output — not just individual tasks
Handoff summary: what changed, what was tested, what agents ran
Full run history preserved in append-only audit log

job-247 / completed

Statuscompleted

Tasks5 / 5 passed

AgentsClaude · Codex · Cursor

Iterations2

Branch mergeddevorch/job-247

Handoff summary available · Final gate: all passed

Failure recovery

Failures are not dead ends

When a task fails validation — whether a quality gate fails or the LLM critic judges the output as not meeting the acceptance criteria — CodeHelm does not stop and wait for you. It re-plans the failing task, adjusts the approach, and dispatches again.

Each retry attempt is logged with the reason for re-planning. If a task exceeds the configured retry limit, it is moved to a dead-letter queue and flagged for review — so you always have full visibility into what happened and why.

You can also trigger manual retries from the dashboard when you want to adjust parameters or pick up where an interrupted run left off.

run-247 / task-02 / retry log

14:02:11[info]Attempt 1: dispatching refactor-auth-core → Codex

14:04:38[warn]Gate: pytest auth tests — 2 failures

14:04:38[info]Critic: output partially meets criteria — re-plan

14:04:39[info]Attempt 2: adjusted approach — dispatching → Claude Code

14:07:12[info]Gate: pytest auth tests — all passed

14:07:13[info]Critic: acceptance criteria met — task accepted

Micro Evolution Engine — signals detected

TODOhigh priority

src/auth/token.py:142

TODO: handle expired token edge case

FIXMEhigh priority

src/api/routes.py:89

FIXME: rate limit not applied to /refresh

testmedium priority

tests/test_auth.py

test_refresh_rotation — failed 3 runs

Continuous improvement

The Micro Evolution Engine

Between major orchestration runs, CodeHelm's Micro Evolution Engine continuously scans your repositories for improvement signals: TODO and FIXME comments, recently failed tests, health check degradations, and log errors.

For each signal, CodeHelm scores its priority and applies a single conservative fix — one change at a time, in a safe window after each successful build. Results are validated before being committed. High-value signals that are too large for automated resolution are captured as reviewable TODO ideas for human promotion to full orchestration runs.

Protected paths are never touched autonomously. You configure which directories are in scope.

Ready to give CodeHelm a goal?

Connect your repositories, configure your AI providers, and submit your first development objective. Free 14-day trial on Pro — no credit card required.