Spec Engineering • Visual explainer

Tri-agent workflow for formal, iterative system design.

Let one LLM search the design space, one formalize truth constraints, and one translate those constraints back into better architecture.

Planner Searches for workflows that achieve the goal with useful phases, actors, and artifacts.
Verifier Pushes the workflow through TLA+, Lean4, and Z3 to expose what is actually coherent.
Critic Turns formal pain into architecture revisions, guard tightening, and approval decisions.

Why this exists

A single model should not both invent a workflow and self-grade its correctness. That mixes optimization pressure with truth-maintenance and tends to yield elegant slop.

Formal stack split

TLA+ for temporal behavior and race conditions. Z3 for local satisfiability and policy logic. Lean4 for semantic precision and proof-carrying invariants.

Key operating rule

All three agents must communicate through a versioned workflow IR. If the planner changes the workflow, old verifier and critic artifacts become stale.

The three agents and their authority boundaries

The planner explores. The verifier formalizes. The critic decides how formal truth should reshape design. None should silently absorb the others’ role.

Goal-oriented

Planner LLM

Produces workflow candidates that achieve the stated objective. It can add states, actors, and artifacts, simplify flows, or propose new structures.

workflow search goal fit tradeoffs
Logic-oriented

Verifier LLM

Formalizes workflow semantics, generates TLA+/Lean4/Z3 artifacts, and reports contradictions, dead states, proof obligations, missing assumptions, and liveness risks.

tla+ z3 lean4
Refinement-oriented

Critic LLM

Interprets verifier findings against product intent and policy. It decides whether to revise, accept, reject, split states, add guards, or surface questions to a human.

revision plan accept / reject policy mediation

Workflow diagrams

These Mermaid diagrams show the tri-agent control loop, artifact/version alignment, and the formal tool split behind the verifier.

1. Core tri-agent loop

The planner proposes, the verifier formalizes, the critic translates findings into revision or approval decisions, and the loop continues until a version is acceptable.

flowchart LR
    H[Human goal + constraints] --> P[Planner LLM\nworkflow IR vN]
    P --> V[Verifier LLM\nformalization + findings]
    V --> C[Critic LLM\nrevision / accept / reject]
    C -->|revise| P
    C -->|approve| A[Approved workflow version]
    C -->|reject| R[Rejected or sent to human]
    A --> I[Implementation]
    I --> X[Implementation verification / refinement]
          

2. Version alignment and staleness

Approval is only valid if the planner output, verifier artifacts, and critic decision all refer to the same workflow version.

flowchart TD
    W[Workflow IR v7] --> F[Verifier artifacts for v7]
    W --> K[Critic decision for v7]
    F --> A[Approval for v7]
    K --> A
    W2[Planner emits v8] --> S[Mark v7 artifacts stale]
    S --> B[Old approval cannot authorize v8]
          

3. Verifier backend split

The verifier is best treated as a multi-backend formalization agent rather than a single “logic model.”

flowchart TB
    IR[Workflow IR] --> T[TLA+\ntemporal model]
    IR --> Z[Z3\nconstraint logic]
    IR --> L[Lean4\nsemantic model + proofs]
    T --> OUT[Unified findings report]
    Z --> OUT
    L --> OUT
    OUT --> C[Critic]
          

4. Meta-workflow state machine

A compact state machine for the workflow that governs the spec-engineering loop itself.

stateDiagram-v2
    [*] --> DraftGoal
    DraftGoal --> ProposedWorkflow: Planner proposes
    ProposedWorkflow --> FormalizationInProgress: Verifier starts
    FormalizationInProgress --> Formalized: Artifacts complete
    Formalized --> NeedsRevision: Critic says revise
    Formalized --> ApprovedForImplementation: Critic says approve
    Formalized --> Rejected: Critic says reject
    NeedsRevision --> ProposedWorkflow: Planner emits v+1
    ApprovedForImplementation --> Implementing: Executor begins
    Implementing --> Verified: Verification / tests pass
          

What each formal tool is responsible for

The verifier should not dump every concern into one proof system. Each tool has a sharp role.

TLA+

  • workflow phases over time
  • async interactions between planner/verifier/critic
  • timeouts, retries, stale artifacts, race conditions
  • progress and revision deadlock questions

Z3

  • approval policy satisfiability
  • severity logic and contradictory rules
  • guard consistency
  • minimal counterexamples for local rule bugs

Lean4

  • state / action data types
  • reachability definitions
  • invariant preservation
  • refinement relationships and trustworthy semantics

Critic synthesis

  • split or merge states when the formal model demands it
  • add human checkpoints when policy or ambiguity requires it
  • separate harmless abstraction debt from blocking correctness risks
  • produce revision actions, not vibes

Shared IR and structured critique

The system becomes much more robust when Planner, Verifier, and Critic exchange versioned structured artifacts instead of prose-only conversation.

workflow_id: spec_impl_v1
goal: "Turn product goals into approved implementation workflows"
actors:
  - Human
  - PlannerLLM
  - VerifierLLM
  - CriticLLM
  - Executor
states:
  - DraftGoal
  - ProposedWorkflow
  - Formalized
  - NeedsRevision
  - ApprovedForImplementation
  - Implementing
  - Verified
transitions:
  - id: propose
    from: DraftGoal
    to: ProposedWorkflow
    actor: PlannerLLM
  - id: formalize
    from: ProposedWorkflow
    to: Formalized
    actor: VerifierLLM
  - id: revise
    from: Formalized
    to: NeedsRevision
    actor: CriticLLM
{
  "decision": "revise",
  "targetVersion": 7,
  "issues": [
    {
      "id": "ISSUE-12",
      "severity": "blocking",
      "type": "missing_state_refinement",
      "message": "ApprovedForImplementation conflates proof-complete and proof-waived cases."
    }
  ],
  "recommendedChanges": [
    {
      "action": "split_state",
      "state": "ApprovedForImplementation",
      "into": ["ApprovedWithProof", "ApprovedWithWaiver"],
      "reason": "explicit policy boundary"
    }
  ]
}

Core starter materials

The docs now point to actual starter artifacts so the workflow can move from architecture language into executable formal scaffolding.

TLA+

Temporal starter model

workflow.tla and workflow.cfg model the tri-agent meta-workflow, version alignment, approval states, and implementation guards.

Lean4

Semantic skeleton

Workflow.lean defines phases, state, step relation, invariants, and explicit proof obligations. It now compiles inside a tiny local Lean scaffold.

Z3

Policy sanity checks

constraints.py checks approval policy satisfiability, stale approval contradictions, and waiver-gating edge cases. A local .venv now supports running it end-to-end.

Concrete example roundtrip

A useful way to read the architecture is through a single iteration: Planner emits a compact workflow, Verifier exposes what is underspecified, Critic translates that into structural changes, and Planner revises into a stronger version.

Roundtrip v1 → v2

What changed

  • Added an explicit NeedsRevision state.
  • Split approval into ApprovedWithProof and ApprovedWithWaiver.
  • Added version-aligned implementation guards.
  • Added a reject path and stronger approval semantics.
state refinement policy visibility stale approval prevention
Reference artifact

Read the full example

The companion markdown file walks through one end-to-end cycle with workflow IR v1, verifier findings, critic output, and planner revision into v2.

Validation snapshot

This pass included lightweight local validation. Syntax and structural checks succeeded, while full runtime checks remain environment-dependent.

Validated

What passed

  • constraints.py passed Python syntax check.
  • constraints.py executed successfully in the local .venv.
  • workflow.tla model-checked successfully with TLC.
  • Workflow.lean compiled via lake build.
  • validate.sh runs the current local validation path.
Environment notes

What to know

  • Z3 is installed in the local .venv rather than the global Python environment.
  • TLC runs via a local jar path discovered in the workspace, not a global binary on PATH.
  • Lean proof obligations remain open even though the scaffold compiles.
  • Generated Lean build outputs are intentionally excluded from the tracked public artifact set.
Next execution step

How to harden it further

  • close the Lean proof obligations with real proofs
  • expand the TLA+ model with more realistic agent races and revision loops
  • enrich the Z3 model with warning thresholds, waiver policy, and human-review constraints

Runtime orchestration protocol

The tri-agent architecture becomes operational when Planner, Verifier, and Critic exchange typed runtime messages with version alignment, correlation ids, artifact references, and explicit staleness semantics.

Protocol schema

Message envelope

A runtime message should carry workflow id, workflow version, correlation id, sender/receiver roles, artifact references, typed payloads, and optional stale-notice metadata.

Why it matters

Control surface

Without a protocol, the tri-agent setup is just a concept. With a protocol, you can actually orchestrate revisions, invalidate stale findings, route questions, and drive execution from approved artifacts.

correlation ids stale notice artifact refs

Protocol message flow

A compact runtime view of how proposal, formalization, critique, revision, approval, and stale invalidation messages move through the system.

5. Runtime message lifecycle

Messages should preserve lineage across one revision cycle and explicitly invalidate stale work when a new workflow version appears.

flowchart LR
    P1[Planner proposal v3] --> V1[Verifier result v3]
    V1 --> C1[Critic revise v3]
    C1 --> P2[Planner revision v4]
    P2 --> S[Stale notice for v3 artifacts]
    P2 --> V2[Verifier result v4]
    V2 --> C2[Critic approval v4]
    C2 --> E[Executor]
          

Second formal model: async coding / review

The tri-agent runtime becomes more realistic when specialized into an engineering loop with asynchronous execution, verification, critique, revision, approval, and merge semantics.

What it captures

Async hazards

It focuses on stale review, execution/review overlap, revision races, and the requirement that approval and merge refer to aligned branch artifacts.

stale review revision race merge safety

Async coding / review state flow

A compact view of how planned work moves through execution, verification, critique, revision, approval, and merge.

6. Async coding / review lifecycle

This is the natural next instantiation of the tri-agent architecture for real engineering work.

stateDiagram-v2
    [*] --> DraftTask
    DraftTask --> Planned: Planner
    Planned --> Executing: Executor starts
    Executing --> ExecutionComplete: Executor finishes
    ExecutionComplete --> Verifying: Verifier starts
    Verifying --> ExecutionComplete: Verifier returns
    ExecutionComplete --> NeedsRevision: Critic says revise
    NeedsRevision --> Planned: Planner updates
    ExecutionComplete --> Approved: Critic says approve
    ExecutionComplete --> Rejected: Critic says reject
    Approved --> Merged: merge with aligned artifacts
          

Core invariants and failure modes

The architecture lives or dies on a few crisp invariants. Most bugs are stale-artifact or authority-boundary bugs rather than “math is hard” bugs.

Required invariants

What must always hold

  • No implementation without approval.
  • No approval with blocking issues unless policy explicitly allows it.
  • Approval must reference the same workflow version that was formalized and critiqued.
  • Planner revision invalidates stale approval for the new version.
  • Verified work must trace back to an approved workflow version.
Common failure modes

What tends to go wrong

  • Old verifier findings are used after planner revision.
  • Critic downgrades genuine correctness failures into hand-wavy “acceptable debt.”
  • Planner and verifier drift in state naming or semantics.
  • Approval policy becomes internally contradictory.
  • The workflow loops forever in revision without escalation to a human.
The planner searches. The verifier constrains. The critic refines.

That separation is the heart of the system. The planner should feel pressure to be useful. The verifier should feel pressure to be correct. The critic should feel pressure to turn correctness pressure into better design.