How it works

From objective to validated output

CodeHelm does not just trigger an AI tool and hand you the result. It plans the work, coordinates agents, validates every step, and iterates until the output meets your acceptance criteria.

01

Connect your repositories

Install the CodeHelm GitHub App on your account or organization. Select the repositories you want to give CodeHelm access to. The App uses short-lived installation tokens — not personal access tokens — so there is no long-lived secret to manage.

  • Works with personal accounts, organizations, and enterprise GitHub
  • Supports private repos, monorepos, and multi-org setups
  • Revoke access at any time from GitHub's App settings
GitHub App installation
myorg/api-serviceConnected
myorg/frontendConnected
myorg/data-pipelineConnected
02

Configure AI providers

Add your AI provider credentials as API keys (BYOK) or existing subscriptions (BYOS). CodeHelm supports OpenAI, Anthropic, Google, Cursor, Codex, and Claude Code. Credentials are encrypted at rest and never proxied through CodeHelm's infrastructure — agents call your provider directly.

  • AES-256 encryption at rest. Never logged or exposed in run output.
  • Multiple providers per workspace. Choose per run or let CodeHelm route.
  • CodeHelm charges for orchestration. You pay your provider separately.
Provider configuration
Claude SonnetBYOK
OpenAI GPT-4oBYOK
Cursor ProBYOS
Codex CLIBYOS
03

Submit a development objective

Provide CodeHelm with a development goal — a natural language description, a spec document, a feature request, or a bug report. You do not need to break this down yourself. CodeHelm parses the objective with an LLM, extracts requirements and acceptance criteria, and builds a structured understanding of the work before any agent runs.

  • Accepts natural language, spec documents, or structured task descriptions
  • LLM extracts requirements, acceptance criteria, and unknowns
  • Repo context is built from the actual file system, services, and git state
New run — objective
Refactor the authentication module to support JWT access tokens with refresh token rotation (RTR). Revoke all tokens on password change. Add integration tests for the rotation flow.
Parsing requirements...
04

Plan and decompose

CodeHelm generates a structured task plan — a dependency graph where each node is an assignable unit of work with an agent assignment, acceptance criteria, and quality gate configuration. Complex objectives are split into parallel and sequential tasks. The plan accounts for the actual state of your repository, not just the prompt.

  • Task dependency graph (DAG) built from requirements and repo context
  • Each task is assigned to the most suitable agent type
  • Parallel tasks are dispatched simultaneously to reduce total execution time
Generated task plan
01 parse-spec + build-context
Claude
02 refactor-auth-core
→ 01Codex
03 update-token-endpoints
→ 01Cursor
04 add-rotation-tests
→ 02,03Codex
05 critic-eval + quality-gates
→ 04Claude
05

Agents execute in parallel

Each task is dispatched to its assigned agent — Cursor CLI for frontend and UI work, Codex for backend logic and heavy reasoning, Claude Code for architecture, refactoring, and multi-file changes. Every agent works in an isolated git worktree so parallel changes never conflict mid-run. If a primary agent fails, CodeHelm automatically falls back through the configured fallback chain.

  • Agents work in isolated git worktrees — no mid-run conflicts
  • Fallback chain: if primary fails, CodeHelm retries with an alternative agent
  • Each agent receives repo context, acceptance criteria, and task constraints
Active dispatch — 3 agents running
refactor-auth-core
Codexrunning
update-token-endpoints
Cursorrunning
add-rotation-tests
Codexqueued
06

Quality gates and LLM critic

When a task completes, it passes through a validation pipeline. Quality gates run automatically: syntax checks, lint, test suites, build steps, and health checks. Then an LLM critic evaluates the output against the original acceptance criteria. If either check fails, CodeHelm re-plans the task with an adjusted approach and dispatches again — automatically, without human intervention.

  • Configurable gates: syntax, lint, tests, build, custom shell commands
  • LLM critic evaluates output against the original acceptance criteria
  • Failures trigger automatic re-plan and retry — not a dead end
Task validation — refactor-auth-core
syntax checkpass
flake8 lintpass
pytest auth testspass
LLM critic — acceptance criteriapass
All gates passed — task accepted
07

Merge, validate, deliver

When all tasks have passed validation, CodeHelm merges the results from individual agent worktrees into the target branch and runs a final gate pass on the combined output. Once the final validation succeeds, a handoff summary is generated with a description of what was done, what was changed, and what was tested. The run history is preserved in the audit log.

  • Final quality gate run on the merged output — not just individual tasks
  • Handoff summary: what changed, what was tested, what agents ran
  • Full run history preserved in append-only audit log
job-247 / completed
Statuscompleted
Tasks5 / 5 passed
AgentsClaude · Codex · Cursor
Iterations2
Branch mergeddevorch/job-247
Handoff summary available · Final gate: all passed
Failure recovery

Failures are not dead ends

When a task fails validation — whether a quality gate fails or the LLM critic judges the output as not meeting the acceptance criteria — CodeHelm does not stop and wait for you. It re-plans the failing task, adjusts the approach, and dispatches again.

Each retry attempt is logged with the reason for re-planning. If a task exceeds the configured retry limit, it is moved to a dead-letter queue and flagged for review — so you always have full visibility into what happened and why.

You can also trigger manual retries from the dashboard when you want to adjust parameters or pick up where an interrupted run left off.

run-247 / task-02 / retry log
14:02:11[info]Attempt 1: dispatching refactor-auth-core → Codex
14:04:38[warn]Gate: pytest auth tests — 2 failures
14:04:38[info]Critic: output partially meets criteria — re-plan
14:04:39[info]Attempt 2: adjusted approach — dispatching → Claude Code
14:07:12[info]Gate: pytest auth tests — all passed
14:07:13[info]Critic: acceptance criteria met — task accepted
Micro Evolution Engine — signals detected
TODOhigh priority
src/auth/token.py:142
TODO: handle expired token edge case
FIXMEhigh priority
src/api/routes.py:89
FIXME: rate limit not applied to /refresh
testmedium priority
tests/test_auth.py
test_refresh_rotation — failed 3 runs
Continuous improvement

The Micro Evolution Engine

Between major orchestration runs, CodeHelm's Micro Evolution Engine continuously scans your repositories for improvement signals: TODO and FIXME comments, recently failed tests, health check degradations, and log errors.

For each signal, CodeHelm scores its priority and applies a single conservative fix — one change at a time, in a safe window after each successful build. Results are validated before being committed. High-value signals that are too large for automated resolution are captured as reviewable TODO ideas for human promotion to full orchestration runs.

Protected paths are never touched autonomously. You configure which directories are in scope.

Ready to give CodeHelm a goal?

Connect your repositories, configure your AI providers, and submit your first development objective. Free 14-day trial on Pro — no credit card required.