A lifecycle for building software the agentic way.

AIDLC is the method I run on every engagement. Eight phases that take an idea from a framed problem to an operated, eval-guarded system. Agents do the heavy lifting at each step. Senior engineering keeps them inside the boundary.

What holds the lifecycle together?

Four principles run underneath all eight phases. They are what keep agentic speed from turning into agentic chaos.

01 / Spec first

Specs are the source of truth

Agents are only as good as what you point them at. Every phase produces an artifact a coding agent can act on without guessing. Ambiguity is resolved before generation, not during review.

02 / Evals as the gate

Evals guard every change

Velocity without a safety net is just faster regressions. Golden datasets and LLM-as-judge suites run on every diff, so the speed agents bring never trades against correctness.

03 / Human in the loop

Senior review on every diff

Agents generate; a senior engineer is accountable. The boundary is defined by people, enforced by the harness, and never left to the model to police on its own.

04 / Production from day one

Observability is not optional

Traces, costs, and the success metric sit on a dashboard from the first slice. If you cannot watch it, it does not ship. Handover later is a config change, not a rewrite.

What does the AIDLC pipeline look like?

The first six phases run once to take a framed problem into a hardened build. The last two run as a loop, so every change in production passes back through evals before it ships.

AIDLC pipeline. Frame, Spec, Scaffold, Generate, Eval, and Harden run in sequence on a first build. Ship and Operate run as a continuous loop in production, feeding changes back through evals before they ship.

How does AIDLC move from idea to operated system?

Each phase has one job and one set of outputs. Phases run in sequence on a first build and as a loop in production. No phase ships without the one before it.

Phase 01 of 8

Frame

Pin down the problem before a line of code exists. We agree on the user, the constraint, and the single metric that proves the system is earning its keep. The output is a one-page scope, not a proposal deck.

  • One-page problem scope
  • Success metric agreed in writing
  • Constraints and non-goals
Phase 02 of 8

Spec

Turn the frame into an executable specification. Agent roles, tool boundaries, data contracts, and acceptance criteria written so both a human and a coding agent can act on them without guessing.

  • Executable spec
  • Agent role and tool inventory
  • Acceptance criteria
Phase 03 of 8

Scaffold

Stand up the skeleton. Repo conventions, type contracts, the eval harness, and the smallest runnable surface. Agents work best inside a structure that already enforces the rules, so we build that structure first.

  • Typed scaffold and conventions
  • Eval harness wired in
  • First runnable slice
Phase 04 of 8

Generate

Agents write the bulk of the code against the spec, supervised by senior review on every diff. Two-week vertical slices, demoable each Friday. Velocity comes from the agents; correctness comes from the harness and the reviewer.

  • Working vertical slices
  • Reviewed diffs
  • Friday demos
Phase 05 of 8

Eval

Behaviour is guarded by golden datasets and LLM-as-judge suites that run on every change. We test the agent's decisions, not just its output, so regressions surface before they reach users.

  • Golden datasets
  • Regression-gated CI
  • Trace-driven backlog
Phase 06 of 8

Harden

Close the gaps that only appear under real load. Prompt-injection defenses, PII redaction, rate limits, fallbacks, and audit logs. Private LLM deployment when residency, PDPL, or DIFC compliance demand it.

  • Guardrails and redaction
  • Audit logging
  • Residency-compliant deployment
Phase 07 of 8

Ship

Release behind a flag, instrument the loop, and roll out by cohort. Costs land on a dashboard from day one. Nothing goes to production that the eval suite and the observability surface cannot watch.

  • Flagged rollout
  • Cost and latency dashboards
  • Production observability
Phase 08 of 8

Operate

The system improves in production. Evals run on every change, traces feed the backlog, and the metric stays on a dashboard. Handover to your team with runbooks, or a monthly retainer if you want us to keep operating it.

  • Runbooks and handover
  • Continuous eval loop
  • Operating cadence

What teams ask about the method

Answers written for operators and engineers evaluating how AIDLC actually runs.

What is AIDLC?
AIDLC is the AI Development Life Cycle, an eight-phase method for building software where agents do the heavy lifting and senior engineering keeps them inside a defined boundary. The phases are Frame, Spec, Scaffold, Generate, Eval, Harden, Ship, and Operate. It runs in sequence on a first build and as a loop in production.
How is AIDLC different from the traditional SDLC?
The traditional SDLC assumes humans write most of the code. AIDLC assumes agents do, so it front-loads the work that makes agents effective. That means a precise spec, a scaffold that enforces conventions, and an eval harness that catches regressions on every diff. The phases that look familiar like ship and operate carry extra weight because an autonomous system needs observability humans would otherwise provide implicitly.
Do agents really write most of the code?
On a well-specified slice, yes, the bulk of generation is agent-driven. But every diff goes through senior review, and nothing merges without passing the eval suite. The speed comes from the agents while the accountability stays with a person. That split is the whole point of the lifecycle.
Where do evals fit, and why so early?
The eval harness is wired in during Scaffold, before any feature code exists, so it can gate every change from the first slice onward. Golden datasets and LLM-as-judge suites test the agent's decisions, not just its output. Catching a behavioural regression in CI is cheap; catching it from a user is not.
Can AIDLC run on a private or on-premise LLM?
Yes. The Harden phase covers private LLM deployment for cases where data residency, PDPL, or DIFC compliance rule out hosted models. The lifecycle is model-agnostic and runs on Claude, GPT, Gemini, or open-weights like Llama and Mistral, so the deployment target is a choice made on constraints, not a rewrite.
How long does a first pass through AIDLC take?
Frame and Spec take roughly a week. The first demoable slice ships inside two weeks of Generate. After that, build runs in two-week vertical slices with a Friday demo. You decide whether to extend after each slice. There is no annual lock-in.
What happens after the build is done?
The Operate phase keeps the system improving in production. Evals run on every change, traces feed the backlog, and the success metric stays on a dashboard. You can take handover with runbooks and eval suites, or keep us on a monthly retainer to operate it. Both use the same observability surface, so switching later is a config change.

Bring the methodto your nextbuild.

Send a one-paragraph brief on what you want built. You will get a written reply within twenty-four hours, with an honest read on how AIDLC would run on it.

Get practical AI and engineering playbooks

Weekly field notes on private AI, automation, and high-performance Next.js builds. Each edition is concise, implementation-ready, and tested in production work.

Open full subscription page

Get the latest insights on AI and full-stack development.