From objective to validated output
CodeHelm does not just trigger an AI tool and hand you the result. It plans the work, coordinates agents, validates every step, and iterates until the output meets your acceptance criteria.
Connect your repositories
Install the CodeHelm GitHub App on your account or organization. Select the repositories you want to give CodeHelm access to. The App uses short-lived installation tokens — not personal access tokens — so there is no long-lived secret to manage.
- Works with personal accounts, organizations, and enterprise GitHub
- Supports private repos, monorepos, and multi-org setups
- Revoke access at any time from GitHub's App settings
Configure AI providers
Add your AI provider credentials as API keys (BYOK) or existing subscriptions (BYOS). CodeHelm supports OpenAI, Anthropic, Google, Cursor, Codex, and Claude Code. Credentials are encrypted at rest and never proxied through CodeHelm's infrastructure — agents call your provider directly.
- AES-256 encryption at rest. Never logged or exposed in run output.
- Multiple providers per workspace. Choose per run or let CodeHelm route.
- CodeHelm charges for orchestration. You pay your provider separately.
Submit a development objective
Provide CodeHelm with a development goal — a natural language description, a spec document, a feature request, or a bug report. You do not need to break this down yourself. CodeHelm parses the objective with an LLM, extracts requirements and acceptance criteria, and builds a structured understanding of the work before any agent runs.
- Accepts natural language, spec documents, or structured task descriptions
- LLM extracts requirements, acceptance criteria, and unknowns
- Repo context is built from the actual file system, services, and git state
Plan and decompose
CodeHelm generates a structured task plan — a dependency graph where each node is an assignable unit of work with an agent assignment, acceptance criteria, and quality gate configuration. Complex objectives are split into parallel and sequential tasks. The plan accounts for the actual state of your repository, not just the prompt.
- Task dependency graph (DAG) built from requirements and repo context
- Each task is assigned to the most suitable agent type
- Parallel tasks are dispatched simultaneously to reduce total execution time
Agents execute in parallel
Each task is dispatched to its assigned agent — Cursor CLI for frontend and UI work, Codex for backend logic and heavy reasoning, Claude Code for architecture, refactoring, and multi-file changes. Every agent works in an isolated git worktree so parallel changes never conflict mid-run. If a primary agent fails, CodeHelm automatically falls back through the configured fallback chain.
- Agents work in isolated git worktrees — no mid-run conflicts
- Fallback chain: if primary fails, CodeHelm retries with an alternative agent
- Each agent receives repo context, acceptance criteria, and task constraints
Quality gates and LLM critic
When a task completes, it passes through a validation pipeline. Quality gates run automatically: syntax checks, lint, test suites, build steps, and health checks. Then an LLM critic evaluates the output against the original acceptance criteria. If either check fails, CodeHelm re-plans the task with an adjusted approach and dispatches again — automatically, without human intervention.
- Configurable gates: syntax, lint, tests, build, custom shell commands
- LLM critic evaluates output against the original acceptance criteria
- Failures trigger automatic re-plan and retry — not a dead end
Merge, validate, deliver
When all tasks have passed validation, CodeHelm merges the results from individual agent worktrees into the target branch and runs a final gate pass on the combined output. Once the final validation succeeds, a handoff summary is generated with a description of what was done, what was changed, and what was tested. The run history is preserved in the audit log.
- Final quality gate run on the merged output — not just individual tasks
- Handoff summary: what changed, what was tested, what agents ran
- Full run history preserved in append-only audit log
Failures are not dead ends
When a task fails validation — whether a quality gate fails or the LLM critic judges the output as not meeting the acceptance criteria — CodeHelm does not stop and wait for you. It re-plans the failing task, adjusts the approach, and dispatches again.
Each retry attempt is logged with the reason for re-planning. If a task exceeds the configured retry limit, it is moved to a dead-letter queue and flagged for review — so you always have full visibility into what happened and why.
You can also trigger manual retries from the dashboard when you want to adjust parameters or pick up where an interrupted run left off.
The Micro Evolution Engine
Between major orchestration runs, CodeHelm's Micro Evolution Engine continuously scans your repositories for improvement signals: TODO and FIXME comments, recently failed tests, health check degradations, and log errors.
For each signal, CodeHelm scores its priority and applies a single conservative fix — one change at a time, in a safe window after each successful build. Results are validated before being committed. High-value signals that are too large for automated resolution are captured as reviewable TODO ideas for human promotion to full orchestration runs.
Protected paths are never touched autonomously. You configure which directories are in scope.