Docs-first development with AI agents

We’ve been building Superkey’s v2 platform with AI agents doing most of the implementation. Not co-pilot style — full task delegation. “Here’s the spec. Build it.” It works, but only because we inverted the normal development workflow.

Most teams using AI agents generate code first and documentation after (if ever). We write the documentation first and generate code from it. The difference sounds pedantic. In practice, it changes everything.

The workflow

Every feature at Superkey follows this pipeline:

Feature request → Spec → Design manifest → Code → Tests → User docs → PR

The AI agent enters at step 4. Steps 1-3 are human work — understanding what to build, why, and what it should look like. The agent has never seen the feature request or sat in the meeting where we argued about it. It doesn’t need to. It needs the spec.

What a spec looks like

We maintain 22 product specs in docs/specs/. Each one covers a complete feature area: the data model, the state machine, the API contract, the permission model, and the business rules. A typical spec is 500-1500 lines of markdown.

Here’s a simplified example of what an agent receives:

## Submission intake

### State machine
received → under_review → quoted → bound → issued
received → declined (terminal)
under_review → returned_to_customer → under_review

### Data model
- submission: id, association_id, status, carrier_id, ...
- submission_documents: id, submission_id, storage_key, ...

### API contract
POST /api/submissions — create submission (role: underwriter, admin)
PATCH /api/submissions/:id/status — transition status
GET /api/submissions/:id — full submission with documents

### Business rules
- Cannot transition to "quoted" without at least one document attached
- "returned_to_customer" requires a return_reason field
- Only admins can transition to "declined"

The agent reads this and builds the route handlers, the Drizzle schema, the status transition logic, and the permission guards. It doesn’t invent behavior because the spec doesn’t leave room for invention. Every branch in the logic is defined.

Design manifests

For UI work, we extracted “design manifests” from founder prototypes — pixel-level descriptions of what each page looks like, what components it uses, and how interactions work. An agent building a page reads the manifest, not the prototype JSX.

This matters because prototypes are messy. They have dead code, experimental branches, inconsistent patterns. The manifest distills the prototype into what the page actually is: layout, data requirements, interaction states, edge cases.

The CLAUDE.md contract

The repo’s CLAUDE.md is the agent’s constitution. It contains:

The documentation hierarchy (which doc to read for what)
Architecture principles (API-first, one write path, mutation pipeline)
Code patterns with examples (how to write a route, how to add a schema table)
Rules that are never negotiable (never use any, never skip the mutation pipeline, never add DB access to the frontend)

Every agent session starts by reading CLAUDE.md. It’s the equivalent of onboarding a new developer — except the “developer” has perfect recall and follows instructions literally.

Why docs-first works with agents

Agents are good at implementation, bad at judgment

An AI agent can write a Hono route handler, a Drizzle schema migration, a React component with proper state management. It can follow patterns, apply conventions, and produce working code.

What it can’t do: decide whether a feature should exist, resolve conflicting requirements from different stakeholders, make tradeoff decisions about scope, or understand why the underwriting team needs a different workflow than the operations team.

Docs-first puts the judgment work (specs, design decisions, prioritization) on humans and the implementation work (code, tests, boilerplate) on agents. This maps to each party’s actual strengths.

Specs eliminate ambiguity, which eliminates bad code

The most common failure mode with AI agents is vague instructions producing plausible-but-wrong code. “Build a submission page” gives the agent enough rope to invent a data model, guess at the permission structure, and hallucinate business rules. The code will compile. It will be wrong.

“Build the submission page per spec 07, design manifest DM-07, using the mutation pipeline pattern from CLAUDE.md §5.3” gives the agent three authoritative sources to build from. If the spec says the status machine has 5 states, the agent builds 5 states. Not 4, not 7.

We still review every PR. But the review is “does this match the spec” not “is this the right approach.” The approach was decided at the spec level.

Documentation stays current because it’s upstream

The perennial problem with documentation: it drifts from the code because nobody updates it. In a code-first workflow, docs are a trailing artifact — always slightly stale, always slightly wrong.

In our workflow, docs are the upstream source. The code derives from the docs. If the code contradicts a doc, the code is wrong — not the doc. This inverts the staleness problem. Developers (and agents) check the spec before building because the spec is the source of truth, not a historical record.

When requirements change, we update the spec first. Then we update the code. This sequence is enforced by CLAUDE.md: “If what you’re about to build contradicts a spec, STOP.” Agents follow this literally, which is exactly what you want.

The numbers

Superkey v2 currently has:

22 product specs (~15,000 lines of markdown)
14 design manifests
21 architecture decision records
~55 database tables
~100 API endpoints
~30 frontend pages across 3 surfaces (internal, portal, board)

The team is small. Most implementation is done by AI agents working from specs. A human reviews every PR, but the time from “feature specced” to “feature merged” is typically hours, not weeks.

I don’t think this would work without the docs. The specs are load-bearing infrastructure — remove them and the agents produce inconsistent, contradictory code that’s harder to review than writing it yourself.

What I’d tell a team starting this

Write the spec before you open the editor. Even if you’re building it yourself, not delegating to an agent. The spec forces you to think through the state machine, the data model, the edge cases. Discovering that your status machine has a missing transition is cheap at the doc level and expensive at the code level.

Make one file the constitution. Call it CLAUDE.md, AGENTS.md, CONTRIBUTING.md — doesn’t matter. Put your patterns, your rules, your non-negotiables in one place that every agent session starts by reading. Update it when you learn something new. This file is the highest-leverage thing you can write.

Don’t let agents make decisions. Use them for implementation, not judgment. The moment you ask an agent “should we use approach A or B?” you’ve abdicated a decision to something that can’t understand the tradeoffs. Decide, write the decision in a spec, then hand the spec to the agent.

Enforce the direction. Docs → code. Never code → docs. The moment you start updating docs to match code, you’ve lost the advantage. The docs are the source of truth. The code is the implementation of the truth. Keep that arrow pointing one way.

Frank Thomas is CTO at Superkey Insurance and the founder of Koji. The workflow described here runs on Claude Code with Anthropic’s Claude models.