JUN 02 2026 ·16 min read

InfraKit: Constraint-Driven Development for Infrastructure-as-Code

techaiai workflowterraformiac

Spec-driven development transformed how we write application code — but pointing it at infrastructure exposed gaps it was never built to cover. This is why I built InfraKit: a constraint-first, four-persona pipeline on top of spec-kit's foundation that refuses to ship IaC that violates your standards.

InfraKit is open-source on GitHub: github.com/neelneelpurk/infrakit.

Six years building infrastructure, and I still struggle with it.

Not the syntax — the judgment. No single person can hold every best practice and every config knob for every cloud resource in their head. There’s too much of it, and it shifts every quarter. Every time I picked up a service I hadn’t touched in a while, the honest answer to “what’s the right way to set this up?” was give me a week. And it really was a week — provider docs, AWS guidance, half-true blog posts, a few GitHub issues — just to learn which knobs matter and which defaults quietly hurt you in prod.

AI demolished that week. Five days of research became five minutes: ask, and a plausible, mostly-correct module comes back. That part is genuinely incredible — I’m not nostalgic for the old way.

But speed isn’t a method. AI handed me answers with no workflow around them — no structure, no enforcement, no second opinion, no record of why anything was decided. And the moment my team leaned on it, that missing method showed up as a mess.

Ask three engineers for “an S3 bucket for exports” and you got three buckets. Different variable names. Different file layout. Different tags — or none. One had every public-access block set; one had none. One pinned the provider with ~>; one let it float. All three terraform plan-ed clean. All three “worked.”

None of them matched. And “works” is a garbage bar for infrastructure.

Why “works” isn’t enough

App code that’s wrong throws an error in your editor. Infra code that’s wrong throws an error in your AWS account — halfway through apply, after it’s already created half a VPC.

And infra carries weight app code doesn’t. It has to satisfy compliance frameworks. It has to wear the tags finance and security trace everything by. It has to know that a missing versioning block is fine in dev and an incident in prod.

Worst of all is the one that quietly kills trust: the hallucinated argument. An agent writes aws_db_instance { engine_version = ... } with a field that doesn’t exist. It looks right. It passes a tired reviewer. It dies at apply. Do that twice and your team goes back to hand-writing HCL — slower, but at least it’s theirs.

So I had two problems, stacked. No enforced standard, so every generation drifted. No field-level trust, so nothing was safe to merge. I wanted one tool that fixed both. That’s InfraKit.

Spec-kit got me halfway

Spec-kit had the right instinct: write the spec first, then plan, then implement, and commit every step to git. It even ships a constitution for project-wide principles. I was sold — and then I tried to point it at infrastructure.

Here’s what spec-kit quietly assumes: writing the spec is the quick part, and implementation is the real work. For IaC that’s backwards. Figuring out the spec — what a given resource actually needs — is easily 70% of the job, because every cloud resource, on every provider, is unique: its own fields, its own defaults, its own ways to fail. There’s no template to copy; each one is its own little research problem. Spec-kit hands you a tidy place to write the spec down and nothing to help you discover what it should say — and discovery is where almost all the real work lives.

And even after that research, two more things didn’t fit.

First, the constraints need enforcing, not just recording. Compliance scope, tagging, naming, per-environment security baselines — spec-kit’s constitution is the right shape, but a document the implementer is asked to remember is not enforcement. The rules my team kept breaking needed something whose only job was to stop them.

Second, one agent can’t wear every hat. Spec-kit runs a single implementer through plan → tasks → implement. But good infra has been looked at by three different people: an architect who weighed cost and failure modes, a security engineer who checked it against controls, and an engineer who wrote field-accurate code. Ask one pass to do all three and it half-does each — the security check rubber-stamped by the same context that just wrote the thing.

None of that is a knock on spec-kit. Infra is just a different problem. It needed the spec-driven shape plus a constraint layer plus real reviewers.

Write the rules first

Here’s the whole idea: capture the constraints before you write a line of resource code, and make them gates — not vibes.

infrakit init scaffolds three files from templates:

.infrakit/
├── context.md          # provider, naming, environments, security defaults, compliance scope
├── coding-style.md     # versions, backend, file layout, validation, security defaults
└── tagging-standard.md # the tags every resource must carry

Two commands fill them in. /infrakit:setup interviews you — about a dozen questions, one at a time: provider, naming, environments, security defaults, compliance frameworks, network topology, DR/HA — and writes context.md and tagging-standard.md. /infrakit:setup-coding-style handles the engineering rules — version policy, backend, tagging strategy, security defaults — and fills coding-style.md.

That’s the part that killed the drift. Every command downstream reads these files. The standard stops living in three people’s heads, half-contradicting each other, and becomes one source of truth the architect, the security engineer, and the IaC engineer all have to obey. “Make me an S3 bucket” now returns our bucket — same naming, same tags, same posture — no matter who runs it.

Two things keep the constraints honest.

They flex by environment. A missing multi_az is nothing in dev and a HIGH finding in prod. The pipeline knows the difference.

And validation is a hard gate, not a suggestion. implement will not mark a track done until tofu validate passes (or cfn-lint / crossplane render for the other tools). Validator can’t run? The track is blocked, not done. There is no “looks good to me” path.

How InfraKit actually works

It’s a Python CLI:

uv tool install infrakit-cli
infrakit init my-infra --ai claude --iac crossplane

init drops slash commands and persona files into your project, laid out for whatever agent you use — Claude Code, Codex, Gemini, Copilot, or generic. Everything ships in the wheel and renders locally, so init makes zero network calls. Nothing for a corporate proxy to break.

Then your intent runs a relay of four specialists, each with one job:

Cloud Solutions Engineer — turns “I need Cassandra” into a structured spec.md, one clarifying question at a time.
Cloud Architect — reviews that spec for reliability, cost, completeness, and environment-fit, and hands back severity-tagged findings with a verdict. It doesn’t design; it judges.
Cloud Security Engineer — audits the spec against the frameworks you scoped (SOC 2, HIPAA, ISO 27001, PCI-DSS, NIST 800-53, CIS, FedRAMP) before any code exists.
IaC Engineer — writes the actual Crossplane YAML (or Terraform HCL, or a CloudFormation template).

<!— IMAGE: the four-persona relay — Solutions Engineer → Cloud Architect → Cloud Security Engineer → IaC Engineer; show the Architect + Security reviews as isolated subagents on Claude —>

Why four personas instead of one prompt with four bullet points? That’s the bet of the whole project: a model wearing one hat at a time beats a model juggling four. On Claude Code it’s not a vibe — the architect and security reviews run as isolated subagents, each in its own context window, so the architect’s reasoning literally can’t leak into the security audit. (The spec phase stays inline, because it’s a conversation, and a subagent can’t stop to ask you a question.)

Every change gets a track — a directory that accumulates spec.md, plan.md, tasks.md, and each review as it happens. It’s a git-native audit trail. Six months later, “why does this table have point-in-time recovery on?” has an answer committed right next to the code: the spec that asked, the review that approved.

And the trust loop — the reason any of this is mergeable: the IaC Engineer is forbidden from writing a field it hasn’t verified against authoritative docs. Crossplane → doc.crds.dev. Terraform → registry.terraform.io. CloudFormation → the AWS resource-type reference. Before it types a field, it has looked up that the field exists and what it takes. Offline and can’t reach the docs? It stops and tells you — it does not write from memory. That single rule is what kills the hallucinated-argument problem.

Not every change deserves the full ceremony. For “bump the keyspace’s capacity mode,” there’s /infrakit:quick_fix: the engineer plans it, generates tasks, shows you the plan, implements — still verifying fields, still tagging, still gated on validation. It skips the spec/architect/security relay, not the safety rails.

Watch it build a Cassandra keyspace

Concrete beats abstract — so let’s build something with real surface area: Apache Cassandra on AWS (Amazon Keyspaces) as a Crossplane composition. This is exactly the kind of resource where “it works” and “it’s right” are miles apart. The full, committed example is on GitHub at examples/crossplane/compositions/xcassandra-keyspace — every snippet below is lifted from it.

First, the context the platform team already captured in .infrakit/context.md. This isn’t generic boilerplate — it’s the constitution every composition inherits:

Base API group platform.acme.com. XRD kinds are PascalCase and X-prefixed (XCassandraKeyspace); claims drop the X (CassandraKeyspace).
Naming is {env}-{team}-{resource}, kebab-case. Compositions are mode: Pipeline only — legacy Resources mode is banned.
Security baseline: encryption at rest on every storage resource, no public network access in prod, SOC 2 + PCI-DSS in scope.
Six mandatory tags on every managed resource: crossplane.io/claim-name, crossplane.io/claim-namespace, managed-by, environment, cost-center, team — each sourced from a label or parameter, never hardcoded.

Now the build:

/infrakit:new_composition cassandra-keyspace

The Solutions Engineer asks what it should provision; I describe it: a Keyspaces keyspace and a time-series table, customer-managed KMS encryption, point-in-time recovery on in prod, on-demand capacity, no public access. The Architect reviews the spec against the prod bar; the Security Engineer checks it against SOC 2 + PCI-DSS. Then:

/infrakit:plan cassandra-keyspace-<timestamp>       # verifies every apiVersion + field against doc.crds.dev, writes tasks.md
/infrakit:implement cassandra-keyspace-<timestamp>  # writes the XRD + Composition + claims; YAML must parse and `crossplane render` must pass

The XRD: a typed, validated API

InfraKit’s Crossplane output is always two files — definition.yaml (the XRD, the machine-readable contract) and composition.yaml (how it’s built). The XRD is where the platform team’s API lives: typed parameters, enums, patterns, defaults, a description on every property.

# definition.yaml (excerpt)
spec:
  group: database.platform.acme.com
  names: { kind: XCassandraKeyspace, plural: xcassandrakeyspaces }
  claimNames: { kind: CassandraKeyspace, plural: cassandrakeyspaces }
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                parameters:
                  type: object
                  required: [environment, teamName, costCenter, keyspaceName]
                  properties:
                    environment:
                      type: string
                      enum: [dev, staging, prod]          # only three values are legal
                      description: Target environment.
                    costCenter:
                      type: string
                      pattern: "^CC-[0-9]{4}$"             # billing-code shape enforced at the API
                      description: Billing allocation code (e.g., CC-1042).
                    keyspaceName:
                      type: string
                      pattern: "^[a-z][a-z0-9_]{0,24}$"    # CQL identifier rules — underscores, NOT dashes
                      description: Logical Cassandra keyspace name.
                    pointInTimeRecovery:
                      type: boolean
                      description: Override PITR. Defaults from environment (prod = on).

Look at that keyspaceName pattern. The org convention is kebab-case with dashes — but Cassandra identifiers can’t contain a dash, and a name that sails past a syntax check fails at reconcile against the real Keyspaces API. That constraint came straight out of the provider docs, and it’s baked into the type: a developer literally cannot submit an invalid name.

The Composition: Pipeline mode, every field verified

composition.yaml runs mode: Pipeline with function-go-templating — the project’s standard pipeline function, never legacy Resources mode. Three managed resources get rendered: a customer-managed KMS key, the keyspace, and the table. Here’s the pipeline and the table it renders — where go-templating earns its keep:

# composition.yaml (excerpt) — Pipeline + function-go-templating
spec:
  mode: Pipeline
  pipeline:
    - step: render-resources
      functionRef:
        name: function-go-templating
      input:
        apiVersion: gotemplating.fn.crossplane.io/v1beta1
        kind: GoTemplate
        source: Inline
        inline:
          template: |
            {{- $p := .observed.composite.resource.spec.parameters }}
            {{- $region := $p.region | default "us-east-1" }}
            {{- $keyspace := printf "%s_%s_%s" $p.environment $p.teamName $p.keyspaceName }}
            {{- $pitr := eq $p.environment "prod" }}
            {{- if hasKey $p "pointInTimeRecovery" }}{{ $pitr = $p.pointInTimeRecovery }}{{ end }}
            {{- $kms := index .observed.resources "kms-key" }}
            {{- $kmsArn := "" }}{{ if $kms }}{{ $kmsArn = $kms.resource.status.atProvider.arn }}{{ end }}
            ---
            # ...the KMS Key and Keyspace render first; the table is the interesting one...
            apiVersion: keyspaces.aws.upbound.io/v1beta1   # verified against doc.crds.dev
            kind: Table
            metadata:
              annotations:
                gotemplating.fn.crossplane.io/composition-resource-name: table
            spec:
              # ORPHAN: stateful production data — never auto-deleted with the composition
              deletionPolicy: Orphan
              forProvider:
                region: {{ $region }}
                # CQL identifiers use underscores, never dashes
                keyspaceName: {{ $keyspace | quote }}
                tableName: {{ $p.tableName | default "events" | quote }}
                # these blocks are arrays in the upjet-generated CRD — not plain objects
                schemaDefinition:
                  - column:
                      - { name: device_id, type: text }
                      - { name: event_time, type: timestamp }
                      - { name: payload, type: text }
                    partitionKey:
                      - { name: device_id }
                    clusteringKey:
                      - { name: event_time, orderBy: DESC }
                capacitySpecification:
                  - throughputMode: PAY_PER_REQUEST          # on-demand; nothing to over-provision
                encryptionSpecification:
                  - type: CUSTOMER_MANAGED_KMS_KEY           # not the default AWS-owned key
                  {{- if $kmsArn }}
                    kmsKeyIdentifier: {{ $kmsArn | quote }}  # set only once the key exists
                  {{- end }}
                pointInTimeRecovery:
                  - status: {{ if $pitr }}ENABLED{{ else }}DISABLED{{ end }}
                tags:
                  crossplane.io/claim-name: {{ index .observed.composite.resource.metadata.labels "crossplane.io/claim-name" | quote }}
                  managed-by: crossplane
                  environment: {{ $p.environment | quote }}
                  # ...+ claim-namespace, team, cost-center...
              providerConfigRef:
                name: default

Every line of that is load-bearing, and none of it was in my prompt:

deletionPolicy: Orphan — the table holds data, and the coding style says a stateful prod resource is never auto-destroyed when its composition is torn down.
encryptionSpecification.type: CUSTOMER_MANAGED_KMS_KEY, wired to a key the composition renders — the security baseline forbids the default AWS-owned key. The {{ if $kmsArn }} guard sets the ARN only once the key actually exists, so the table never reconciles against a half-built key.
pointInTimeRecovery defaults on in prod, off elsewhere — and in go-templating that’s a plain if, with a one-line override hook (hasKey $p "pointInTimeRecovery") instead of a CombineFromComposite+map. Same environment-aware rule, far less ceremony.
The six mandatory tags, written straight from the claim’s labels and parameters.

And keyspaces.aws.upbound.io/v1beta1, encryptionSpecification, capacitySpecification, pointInTimeRecovery.status: ENABLED — every one was checked against doc.crds.dev first. Get the enum wrong (ENABLED vs enabled) or the path wrong (encryptionSpecification vs encryptionConfig) and it reconciles to a permanent error against AWS. That’s the failure InfraKit exists to make impossible.

What the developer actually touches

The platform team wrote all of that once. A product team consumes it with a claim — and notice there’s no password anywhere, because Keyspaces authenticates with SigV4 over the app’s IRSA role, so there’s no credential to leak:

# a product team's claim — the entire API surface they see
apiVersion: database.platform.acme.com/v1alpha1
kind: CassandraKeyspace
metadata:
  name: telemetry
  namespace: data
spec:
  parameters:
    environment: prod
    teamName: data
    costCenter: CC-1042
    keyspaceName: device_telemetry
    tableName: device_events

That’s it. They get a prod-grade, encrypted, backed-up, correctly-tagged Cassandra keyspace and table — plus status fields exposing the contact point and keyspace name — without knowing that pointInTimeRecovery.status is even a field. Then /infrakit:review compositions/cassandra-keyspace grades the YAML against the standards and offers to fix anything it flags.

<!— IMAGE: terminal capture of /infrakit:plan verifying keyspaces.aws.upbound.io fields against doc.crds.dev, then a passing crossplane render —>

The point isn’t that any one patch is clever. It’s that the developer who’s never heard of a clustering key still ships the same correct, compliant keyspace the platform team would have written by hand — and every field was checked against the provider’s own schema before it existed.

Who it’s for

Two kinds of people — and the same machinery serves both.

Platform and DevOps teams who are tired of being the bottleneck. You wrote the golden-path composition once and watched everyone fork it anyway. InfraKit turns your standards into something the agent enforces for you — naming, tagging, encryption defaults, compliance posture — with a git-native audit trail you can hand an auditor. You stop re-reviewing the same five mistakes and start reviewing actual decisions.

Developers who have never touched the cloud. This is the one I care about most. You shouldn’t have to know that an S3 bucket has four separate public-access flags, that an Amazon Keyspaces table name can’t contain a dash, or what your company’s tagging policy even is. You say what you want — “a Cassandra keyspace for my service” — and the Solutions Engineer asks the right questions, the Architect and Security personas hold the line on everything you didn’t know to ask about, and the IaC Engineer writes only fields it actually verified. You get infrastructure that passes review on the first try, without a year of cloud tribal knowledge first.

The constraint layer is what makes both true at once: the platform team writes the rules down once, and everyone else — including the person who’s never opened the AWS console — inherits them automatically.

Where it doesn’t save you

I’d rather you hear this from me.

The security review is a heuristic, not an audit. It’s an LLM flagging structural patterns against named frameworks. Genuinely useful for catching the public endpoint, the unencrypted table, the hardcoded secret. It is not a substitute for real auditors doing evidence collection. Don’t put “InfraKit-approved” in your SOC 2 report.

The four-persona bet is still a bet. “Four focused personas beat one model juggling four jobs” matches everything I’ve seen running real migrations — but it’s an intuition I’m still gathering evidence for, not a proven result. I’m holding it loosely.

The trust loop needs the docs reachable. The “never guess a field” guarantee depends on the engineer reaching doc.crds.dev or registry.terraform.io at plan time. The CLI is fully offline; the agent’s verification step isn’t. Behind a doc-blocking proxy it stops rather than guesses — safe, but it does mean the strongest promise wants network access.

The supported surface is deliberately small. Three IaC tools (Crossplane, Terraform, CloudFormation; OpenTofu and Pulumi are roadmap), five agents. I cut others when the per-agent maintenance got silly. For something that touches prod, narrow-and-trustworthy beats broad-and-flaky.

The bet

Spec-driven development said write the spec first. Constraint-driven development says write the rules first — then refuse to generate anything that breaks them.

InfraKit exists because my team couldn’t get two AI-written compositions to agree, and because none of us trusted a field we hadn’t personally looked up. Spec-kit gave me the shape. The four personas, the constraint files, the doc-verification loop, and the git-native trail are what I bolted on to make AI-generated infrastructure something I’d actually merge.

It’s MIT-licensed and on PyPI:

uv tool install infrakit-cli
infrakit init my-infra --ai claude --iac crossplane

If your team has the same drift — everyone’s agent confidently producing a different “correct” answer — try it and tell me where it breaks. The bet only gets sharper with evidence thrown at it.