Decoupling Completion from Correctness
Evidence-Gated Multi-Agent Code Generation Under Repository Constraints
Abstract
Modern LLM coding assistants are excellent at finishing requests, but real engineering failures happen when a “finished” change quietly violates a repository’s constraints: module boundaries, security posture, test contracts, and operational assumptions. This working draft proposes a governance-first architecture that makes repository fitness the stopping condition. It combines local grounding (HugeContext), an Agent–Auditor loop (HugeCode), and a deterministic Gatekeeper that enforces fitness functions via static analysis, tests, and policy checks.
Key Contributions
- 1 Reframe completion as provisional and make repository fitness the stopping condition
- 2 Governance-first architecture combining grounding, adversarial auditing, and deterministic gates
- 3 HugeContext: local repository grounding for constraint-relevant evidence
- 4 HugeCode: Agent–Auditor loop designed to resist completion bias and security shortcuts
- 5 Gatekeeper: deterministic enforcement via tests, static analysis, and policy checks
- 6 Evaluation design centered on real repository constraints and observed failure modes
Why This Matters
In production repositories, plausibly-correct code is not enough. The expensive failures come from drift: shortcuts that bypass conventions, security posture, or test contracts. This work proposes an evidence-gated workflow that keeps speed while making “done” contingent on repository fitness.
Overview
LLM coding assistants are great at outputting something that looks finished. But a production repository is not a blank page: it has constraints that rarely fit into a single prompt. When the assistant optimizes for “completion”, it tends to drift from the repository’s truth (conventions, dependencies, security posture, and test contracts).
This draft proposes a governance-first workflow where completion is provisional until verified against explicit fitness functions.
Architecture (High Level)
The proposed system has three cooperating parts:
- HugeContext (Grounding): retrieves constraint-relevant evidence from the repository (module boundaries, patterns, policies) so changes are anchored in local truth.
- HugeCode (Agent–Auditor Loop): generates candidates, then adversarially audits them to resist “consensus-by-completion” and surface hidden risks.
- Gatekeeper (Deterministic): enforces repository fitness functions (tests, static analysis, policy checks) and blocks merges without evidence.
What “Evidence-Gated” Means
Instead of treating a natural-language answer as “done”, the system produces an evidence pack alongside the proposed change:
- What invariants are being relied on
- What repository evidence supports the design
- What fitness functions were run and passed
- What risks remain and how they’re mitigated
This shifts the stopping condition from “a coherent patch exists” to “the patch is repository-fit”.
Fitness Functions (Examples)
Gatekeeper is intentionally boring and deterministic. Typical checks include:
- Unit/integration tests
- Type checks and linting
- Dependency and licensing policies
- Secrets scanning and security linters
- Repo-specific checks (conventions, build steps, CI workflows)
Status & Next Steps
This is a working draft. The benchmark harness and fully reproducible figures are in progress.
See the Updates section for progress notes.
Software Availability
- HugeContext (public): https://www.hugecontext.com
- HugeContext (VS Code Marketplace): https://marketplace.visualstudio.com/items?itemName=codavidgarcia.hugecontext
Feedback
If you have examples of “looks correct but breaks the repo” failures, or you want to review the draft, contact me.
References
Updates
Paper draft progressing
Made significant progress on the RAG evaluation framework paper. Case studies from enterprise deployments are coming together.
Get Notified on Release
Interested in early access, collaboration, or providing feedback on the draft? Reach out directly.
Contact Me