14 May 2026

·

5 min read

Spec-Driven DevelopmentRegulatory ComplianceAI-Augmented DeliveryEngineering Pod

Spec-Driven Development in Regulated Enterprises: Where It Breaks

SDD is the hot framing in agentic engineering. In unregulated software it works. In banks, insurers, and FDA-regulated platforms, it collides with the regulator's point-in-time audit trail — a model mismatch most SDD advocates don't address.

Anystack Engineering

What SDD promises and why teams adopt it

Spec-driven development treats the specification as the canonical artefact and the code as a generated, regenerable derivative. Change the spec, regenerate the code. The appeal is straightforward: agentic tooling works better against precise specifications than against natural-language tickets, the iteration loop is faster, and the spec itself becomes the trace of intent. For high-velocity product teams in unregulated domains, this is a genuine productivity unlock.

The teams adopting SDD most aggressively in 2026 are consumer-product teams, internal-tooling shops, and AI-native startups where the regulator's audit boundary is not on the critical path. Their gains are real. The pattern is not the problem.

The problem is the uniformity. SDD's advocates often present it as a universal replacement for the prior model — and that is where it collides with the regulatory layer of any enterprise that runs in a regulated jurisdiction.

The audit-trail collision

Three regulatory regimes that explicitly require point-in-time, immutable provenance for every deployed code artefact:

  • SOX 404 / PCAOB AS 2201 (US public-company financial systems). Every change to a financially-material system must produce a point-in-time, immutable record of *what* was changed, *when*, *by whom*, *what tested it*, and *who reviewed it*. The auditor's question, four quarters later, is: "what code was deployed on date X that affected this reported transaction?" The expected answer is a Git commit hash, a code review trail, and a test artefact tied to that commit.
  • Solvency II Article 258 (EU/UK insurance). Carriers must demonstrate continuous control over actuarial and financial systems. The PRA's SS3/19 expectation in the UK is that the firm can produce, on demand, the exact state of the calculation engine as it ran for any historical reporting period. Re-generation is acceptable as a *reproducibility* claim; it is not acceptable as the *primary record*.
  • FDA 21 CFR Part 11 (medical devices, pharmacovigilance). Electronic records must "ensure the authenticity, integrity, and confidentiality" of any change. The regulation predates AI-generated code, but the FDA's 2023–2024 guidance is clear: generated code is subject to the same validation rigour as hand-written code, including a documented review by a qualified human.

SDD's living-spec model collides with all three. If the spec is the source of truth and the code is regenerated from the spec, the regulator's question receives a fuzzy answer: "the code in production on March 15 was the artefact generated from spec version 3.2.7, regenerated at deployment time." That is not necessarily wrong, but the regulator wants the artefact itself — not a recipe for producing it.

The fix is not impossible. It is that every regenerated artefact must be *immutably committed* to source control, *re-reviewed* by a human, and *re-tested* against the regulated test suite — as if it had been hand-written. Most SDD tooling does not do this by default. Most SDD adopters in regulated industries discover this gap during their first internal audit, not before.

Where SDD does work — the non-regulated surface inside regulated enterprises

This is the part that gets lost in the binary framing. A regulated enterprise is not uniformly regulated. A bank's marketing site, internal HR platform, knowledge-base search, customer-support tooling, and developer-experience platform are not the financial-reporting general ledger. SDD's velocity gains apply to all of them.

The practical question is where to draw the boundary. A useful heuristic: if an auditor would ask "what was the deployed state of this system on date X?" — that system is on the regulated side of the boundary, and SDD's living-spec model needs the graduated-traceability adaptation below. If they would not, SDD-style iteration is fine and probably faster.

What a senior engineering pod does instead — graduated traceability

The operating model that survives both SDD's velocity advantage and the regulator's immutability requirement is what we call *graduated traceability*. Two characteristics:

  • The spec is treated as a design artefact, not the source of truth. It informs the implementation, it guides agentic tooling, it accelerates iteration during development. At the moment of deployment to a regulated surface, the *compiled, reviewed, tested code* becomes the immutable record. The spec is preserved as design history alongside it. This is the audit-trail-compatible posture.
  • The model is dual, not uniform. On non-regulated surfaces, the pod operates in SDD-native mode: rapid spec iteration, agent-generated code, hand-reviewed at the integration layer. On regulated surfaces, every artefact is committed, reviewed, and tested against the qualified test suite before deployment.

The practical cost of dual-model operation is approximately a 15–25% velocity reduction on the regulated surface relative to the non-regulated one. That is the price of regulator-tolerant delivery. It is materially cheaper than a fine, a delayed audit, or a Section 404 deficiency disclosure.

The 3-person AI-augmented engineering pod we run with regulated clients operates this dual model continuously — the boundary is mapped during the audit phase of the engagement, the tooling configuration enforces it from week one, and the handoff includes the boundary documentation as a deliverable. This is the operating model that lets a small senior team deliver at speed without forcing the client to choose between velocity and compliance.

Where to start

If SDD is on your engineering roadmap and you operate in any regulated jurisdiction, the first move is to map the regulatory boundary across your service catalogue. Not all systems need the graduated-traceability adaptation; some explicitly do. A 2-hour conversation with someone who has run this model is the cheapest insurance against discovering the gap during an audit.

If you want to scope an engagement that handles this boundary, book a 30-minute call. If you would rather see how it has played out in practice first, our case studies walk through outcomes at regulated clients including EY and John Lewis.

Frequently asked questions

What is spec-driven development (SDD)?

Spec-driven development treats the specification as the canonical artefact and the code as a generated, regenerable derivative. Change the spec, regenerate the code. The pattern works well with agentic tooling and gives high-velocity product teams a fast iteration loop. The catch is that it treats deployed code as a derivative artefact rather than as the immutable record — which collides with regulatory frameworks that expect the deployed code itself to be the auditable record.

Is SDD safe to use in regulated environments at all?

Yes, on the non-regulated surfaces inside a regulated enterprise. A bank's marketing site, internal HR platform, knowledge-base search, and developer-experience tooling are not subject to SOX 404 or PCAOB audit. SDD's velocity gains apply there freely. The issue is uniformly applying SDD across the regulated boundary — the financial-reporting general ledger, actuarial engine, medical-device firmware, or pharmacovigilance systems.

How does graduated traceability satisfy SOX 404 and Solvency II audit requirements?

Graduated traceability keeps the spec as a design artefact (informing implementation, accelerating iteration) but treats the compiled, reviewed, tested code as the immutable record at deployment. The Git commit hash, code review trail, and test artefact tied to the commit satisfy the auditor's point-in-time question. The spec is preserved as design history alongside the code, not as a regenerable substitute for it.

What is the velocity cost of dual-model operation?

Approximately 15–25% velocity reduction on the regulated surface relative to the non-regulated one. That figure assumes the regulatory boundary is mapped at the start of the engagement and tooling enforces it from day one. Retrofitting the dual model after the fact (the most common mistake) typically doubles the velocity penalty during the transition.

Can a small team implement this without massive process overhead?

Yes. The dual model is fundamentally a tooling configuration plus a service-catalogue boundary map, not a heavyweight governance layer. A 3-person AI-augmented engineering pod can install both inside a 6–8 week pilot. The boundary map is the audit-phase deliverable; the tooling enforcement is the pilot-phase deliverable. After that, the dual model runs itself.

Free engineering audit

Want a structured assessment of where this applies to your stack? Our 30-minute tech audit is free.

Request audit →

Start a conversation

Facing a version of this in your organisation? We scope engagements in a single call.

Book a 30-min call →

See the evidence

Read how we've delivered these outcomes for clients in fintech, healthcare, and telecom.

Browse case studies →