Spec-Driven Development in Regulated Enterprises: Where It Breaks

SDD is the hot framing in agentic engineering. In unregulated software it works. In banks, insurers, and FDA-regulated platforms, it collides with the regulator's point-in-time audit trail — a model mismatch most SDD advocates don't address.

Anystack Engineering

The short version

Spec-driven development (SDD) is the hottest new framing in agentic-engineering circles: write a precise specification, let the agent execute it, iterate on the spec rather than the code. In unregulated software this works. In regulated industries — banking, insurance, medical devices, pharmacovigilance — it collides with a structural requirement most SDD advocates don't address: the regulator's point-in-time, immutable audit trail.

This is not a tooling problem. It is a model mismatch. SDD's living-spec abstraction is incompatible with PCAOB AS 2201, Solvency II Article 258, and FDA 21 CFR Part 11 by design — not because the tools are immature, but because the operating model treats the spec as the source of truth and the code as a regenerable derivative. The regulator does not accept that framing.

What works instead: a 3-person engineering pod running a dual model — SDD-style iteration on non-regulated surfaces, graduated-traceability on regulated ones.

What SDD promises and why teams adopt it

Spec-driven development treats the specification as the canonical artefact and the code as a generated, regenerable derivative. Change the spec, regenerate the code. The appeal is straightforward: agentic tooling works better against precise specifications than against natural-language tickets, the iteration loop is faster, and the spec itself becomes the trace of intent. For high-velocity product teams in unregulated domains, this is a genuine productivity unlock.

The teams adopting SDD most aggressively in 2026 are consumer-product teams, internal-tooling shops, and AI-native startups where the regulator's audit boundary is not on the critical path. Their gains are real. The pattern is not the problem.

The problem is the uniformity. SDD's advocates often present it as a universal replacement for the prior model — and that is where it collides with the regulatory layer of any enterprise that runs in a regulated jurisdiction.

The audit-trail collision

Three regulatory regimes that explicitly require point-in-time, immutable provenance for every deployed code artefact:

SOX 404 / PCAOB AS 2201 (US public-company financial systems). Every change to a financially-material system must produce a point-in-time, immutable record of *what* was changed, *when*, *by whom*, *what tested it*, and *who reviewed it*. The auditor's question, four quarters later, is: "what code was deployed on date X that affected this reported transaction?" The expected answer is a Git commit hash, a code review trail, and a test artefact tied to that commit.

Solvency II Article 258 (EU/UK insurance). Carriers must demonstrate continuous control over actuarial and financial systems. The PRA's SS3/19 expectation in the UK is that the firm can produce, on demand, the exact state of the calculation engine as it ran for any historical reporting period. Re-generation is acceptable as a *reproducibility* claim; it is not acceptable as the *primary record*.

FDA 21 CFR Part 11 (medical devices, pharmacovigilance). Electronic records must "ensure the authenticity, integrity, and confidentiality" of any change. The regulation predates AI-generated code, but the FDA's 2023–2024 guidance is clear: generated code is subject to the same validation rigour as hand-written code, including a documented review by a qualified human.

SDD's living-spec model collides with all three. If the spec is the source of truth and the code is regenerated from the spec, the regulator's question receives a fuzzy answer: "the code in production on March 15 was the artefact generated from spec version 3.2.7, regenerated at deployment time." That is not necessarily wrong, but the regulator wants the artefact itself — not a recipe for producing it.

The fix is not impossible. It is that every regenerated artefact must be *immutably committed* to source control, *re-reviewed* by a human, and *re-tested* against the regulated test suite — as if it had been hand-written. Most SDD tooling does not do this by default. Most SDD adopters in regulated industries discover this gap during their first internal audit, not before.

What an audit deficiency actually looks like

A representative case from a UK-listed insurer that adopted SDD on its claims-pricing platform in 2024. Their external auditor's Q3 review opened with the standard Solvency II question: produce the exact state of the pricing engine as it ran on three reported reporting dates in the prior year. The engineering team produced the spec versions, the change log, the deployment timestamps, and the regeneration toolchain — all clean, all reviewed, all signed off.

What they could not produce was the compiled, deployed artefact itself. Their CI/CD pipeline regenerated the binary from the spec at every deployment and discarded intermediate builds. The spec was preserved. The executed code was not.

The auditor's response was not "this is acceptable because the spec is canonical." The response, per PRA SS3/19, was that the firm could not demonstrate continuous control over a financial system — recorded as a material control finding in the audit report. The remediation took two quarters: a parallel artefact-archival service, build-time provenance signing, a quarterly attestation tied to each deployed binary, and a re-baseline of the prior twelve months of deployments where it was still possible to do so. The bill ran to roughly £400k of engineering work, a delayed product roadmap, and a remediation footnote that the auditor will reference at every subsequent review.

The SDD pattern was not the problem. The deployment-time regeneration without a preserved artefact was. Most regulated adopters discover this distinction during the audit, not before it.

Where SDD does work — the non-regulated surface inside regulated enterprises

This is the part that gets lost in the binary framing. A regulated enterprise is not uniformly regulated. A bank's marketing site, internal HR platform, knowledge-base search, customer-support tooling, and developer-experience platform are not the financial-reporting general ledger. SDD's velocity gains apply to all of them.

The practical question is where to draw the boundary. A useful heuristic: if an auditor would ask "what was the deployed state of this system on date X?" — that system is on the regulated side of the boundary, and SDD's living-spec model needs the graduated-traceability adaptation below. If they would not, SDD-style iteration is fine and probably faster.

What a senior engineering pod does instead — graduated traceability

The operating model that survives both SDD's velocity advantage and the regulator's immutability requirement is what we call *graduated traceability*. Two characteristics:

The spec is treated as a design artefact, not the source of truth. It informs the implementation, it guides agentic tooling, it accelerates iteration during development. At the moment of deployment to a regulated surface, the *compiled, reviewed, tested code* becomes the immutable record. The spec is preserved as design history alongside it. This is the audit-trail-compatible posture.

The model is dual, not uniform. On non-regulated surfaces, the pod operates in SDD-native mode: rapid spec iteration, agent-generated code, hand-reviewed at the integration layer. On regulated surfaces, every artefact is committed, reviewed, and tested against the qualified test suite before deployment.

The practical cost of dual-model operation is approximately a 15–25% velocity reduction on the regulated surface relative to the non-regulated one. That is the price of regulator-tolerant delivery. It is materially cheaper than a fine, a delayed audit, or a Section 404 deficiency disclosure.

The 3-person AI-augmented engineering pod we run with regulated clients operates this dual model continuously — the boundary is mapped during the audit phase of the engagement, the tooling configuration enforces it from week one, and the handoff includes the boundary documentation as a deliverable. This is the operating model that lets a small senior team deliver at speed without forcing the client to choose between velocity and compliance.

Where to start

If SDD is on your engineering roadmap and you operate in any regulated jurisdiction, the first move is to map the regulatory boundary across your service catalogue. Not all systems need the graduated-traceability adaptation; some explicitly do. A 2-hour conversation with someone who has run this model is the cheapest insurance against discovering the gap during an audit.

If you want to scope an engagement that handles this boundary, book a 30-minute call. If you would rather see how it has played out in practice first, our case studies walk through outcomes at regulated clients including EY and John Lewis.

Frequently asked questions

What is spec-driven development (SDD)?

Spec-driven development treats the specification as the canonical artefact and the code as a generated, regenerable derivative. Change the spec, regenerate the code. The pattern works well with agentic tooling and gives high-velocity product teams a fast iteration loop. The catch is that it treats deployed code as a derivative artefact rather than as the immutable record — which collides with regulatory frameworks that expect the deployed code itself to be the auditable record.

Is SDD safe to use in regulated environments at all?

Yes, on the non-regulated surfaces inside a regulated enterprise. A bank's marketing site, internal HR platform, knowledge-base search, and developer-experience tooling are not subject to SOX 404 or PCAOB audit. SDD's velocity gains apply there freely. The issue is uniformly applying SDD across the regulated boundary — the financial-reporting general ledger, actuarial engine, medical-device firmware, or pharmacovigilance systems.

How does graduated traceability satisfy SOX 404 and Solvency II audit requirements?

Graduated traceability keeps the spec as a design artefact (informing implementation, accelerating iteration) but treats the compiled, reviewed, tested code as the immutable record at deployment. The Git commit hash, code review trail, and test artefact tied to the commit satisfy the auditor's point-in-time question. The spec is preserved as design history alongside the code, not as a regenerable substitute for it.

What is the velocity cost of dual-model operation?

Approximately 15–25% velocity reduction on the regulated surface relative to the non-regulated one. That figure assumes the regulatory boundary is mapped at the start of the engagement and tooling enforces it from day one. Retrofitting the dual model after the fact (the most common mistake) typically doubles the velocity penalty during the transition.

Can a small team implement this without massive process overhead?

Yes. The dual model is fundamentally a tooling configuration plus a service-catalogue boundary map, not a heavyweight governance layer. A 3-person AI-augmented engineering pod can install both inside a 6–8 week pilot. The boundary map is the audit-phase deliverable; the tooling enforcement is the pilot-phase deliverable. After that, the dual model runs itself.