Planner LLM
Produces workflow candidates that achieve the stated objective. It can add states, actors, and artifacts, simplify flows, or propose new structures.
Let one LLM search the design space, one formalize truth constraints, and one translate those constraints back into better architecture.
A single model should not both invent a workflow and self-grade its correctness. That mixes optimization pressure with truth-maintenance and tends to yield elegant slop.
TLA+ for temporal behavior and race conditions. Z3 for local satisfiability and policy logic. Lean4 for semantic precision and proof-carrying invariants.
All three agents must communicate through a versioned workflow IR. If the planner changes the workflow, old verifier and critic artifacts become stale.
The planner explores. The verifier formalizes. The critic decides how formal truth should reshape design. None should silently absorb the others’ role.
Produces workflow candidates that achieve the stated objective. It can add states, actors, and artifacts, simplify flows, or propose new structures.
Formalizes workflow semantics, generates TLA+/Lean4/Z3 artifacts, and reports contradictions, dead states, proof obligations, missing assumptions, and liveness risks.
Interprets verifier findings against product intent and policy. It decides whether to revise, accept, reject, split states, add guards, or surface questions to a human.
These Mermaid diagrams show the tri-agent control loop, artifact/version alignment, and the formal tool split behind the verifier.
The planner proposes, the verifier formalizes, the critic translates findings into revision or approval decisions, and the loop continues until a version is acceptable.
flowchart LR
H[Human goal + constraints] --> P[Planner LLM\nworkflow IR vN]
P --> V[Verifier LLM\nformalization + findings]
V --> C[Critic LLM\nrevision / accept / reject]
C -->|revise| P
C -->|approve| A[Approved workflow version]
C -->|reject| R[Rejected or sent to human]
A --> I[Implementation]
I --> X[Implementation verification / refinement]
Approval is only valid if the planner output, verifier artifacts, and critic decision all refer to the same workflow version.
flowchart TD
W[Workflow IR v7] --> F[Verifier artifacts for v7]
W --> K[Critic decision for v7]
F --> A[Approval for v7]
K --> A
W2[Planner emits v8] --> S[Mark v7 artifacts stale]
S --> B[Old approval cannot authorize v8]
The verifier is best treated as a multi-backend formalization agent rather than a single “logic model.”
flowchart TB
IR[Workflow IR] --> T[TLA+\ntemporal model]
IR --> Z[Z3\nconstraint logic]
IR --> L[Lean4\nsemantic model + proofs]
T --> OUT[Unified findings report]
Z --> OUT
L --> OUT
OUT --> C[Critic]
A compact state machine for the workflow that governs the spec-engineering loop itself.
stateDiagram-v2
[*] --> DraftGoal
DraftGoal --> ProposedWorkflow: Planner proposes
ProposedWorkflow --> FormalizationInProgress: Verifier starts
FormalizationInProgress --> Formalized: Artifacts complete
Formalized --> NeedsRevision: Critic says revise
Formalized --> ApprovedForImplementation: Critic says approve
Formalized --> Rejected: Critic says reject
NeedsRevision --> ProposedWorkflow: Planner emits v+1
ApprovedForImplementation --> Implementing: Executor begins
Implementing --> Verified: Verification / tests pass
The verifier should not dump every concern into one proof system. Each tool has a sharp role.
The system becomes much more robust when Planner, Verifier, and Critic exchange versioned structured artifacts instead of prose-only conversation.
workflow_id: spec_impl_v1
goal: "Turn product goals into approved implementation workflows"
actors:
- Human
- PlannerLLM
- VerifierLLM
- CriticLLM
- Executor
states:
- DraftGoal
- ProposedWorkflow
- Formalized
- NeedsRevision
- ApprovedForImplementation
- Implementing
- Verified
transitions:
- id: propose
from: DraftGoal
to: ProposedWorkflow
actor: PlannerLLM
- id: formalize
from: ProposedWorkflow
to: Formalized
actor: VerifierLLM
- id: revise
from: Formalized
to: NeedsRevision
actor: CriticLLM
{
"decision": "revise",
"targetVersion": 7,
"issues": [
{
"id": "ISSUE-12",
"severity": "blocking",
"type": "missing_state_refinement",
"message": "ApprovedForImplementation conflates proof-complete and proof-waived cases."
}
],
"recommendedChanges": [
{
"action": "split_state",
"state": "ApprovedForImplementation",
"into": ["ApprovedWithProof", "ApprovedWithWaiver"],
"reason": "explicit policy boundary"
}
]
}
The docs now point to actual starter artifacts so the workflow can move from architecture language into executable formal scaffolding.
workflow.tla and workflow.cfg model the tri-agent meta-workflow, version alignment, approval states, and implementation guards.
Workflow.lean defines phases, state, step relation, invariants, and explicit proof obligations. It now compiles inside a tiny local Lean scaffold.
constraints.py checks approval policy satisfiability, stale approval contradictions, and waiver-gating edge cases. A local .venv now supports running it end-to-end.
A useful way to read the architecture is through a single iteration: Planner emits a compact workflow, Verifier exposes what is underspecified, Critic translates that into structural changes, and Planner revises into a stronger version.
NeedsRevision state.ApprovedWithProof and ApprovedWithWaiver.The companion markdown file walks through one end-to-end cycle with workflow IR v1, verifier findings, critic output, and planner revision into v2.
This pass included lightweight local validation. Syntax and structural checks succeeded, while full runtime checks remain environment-dependent.
constraints.py passed Python syntax check.constraints.py executed successfully in the local .venv.workflow.tla model-checked successfully with TLC.Workflow.lean compiled via lake build.validate.sh runs the current local validation path..venv rather than the global Python environment.The tri-agent architecture becomes operational when Planner, Verifier, and Critic exchange typed runtime messages with version alignment, correlation ids, artifact references, and explicit staleness semantics.
A runtime message should carry workflow id, workflow version, correlation id, sender/receiver roles, artifact references, typed payloads, and optional stale-notice metadata.
Without a protocol, the tri-agent setup is just a concept. With a protocol, you can actually orchestrate revisions, invalidate stale findings, route questions, and drive execution from approved artifacts.
A compact runtime view of how proposal, formalization, critique, revision, approval, and stale invalidation messages move through the system.
Messages should preserve lineage across one revision cycle and explicitly invalidate stale work when a new workflow version appears.
flowchart LR
P1[Planner proposal v3] --> V1[Verifier result v3]
V1 --> C1[Critic revise v3]
C1 --> P2[Planner revision v4]
P2 --> S[Stale notice for v3 artifacts]
P2 --> V2[Verifier result v4]
V2 --> C2[Critic approval v4]
C2 --> E[Executor]
The tri-agent runtime becomes more realistic when specialized into an engineering loop with asynchronous execution, verification, critique, revision, approval, and merge semantics.
This model tracks plan version, execution version, verified version, critiqued version, and approved version so stale reviews cannot accidentally authorize merge.
It focuses on stale review, execution/review overlap, revision races, and the requirement that approval and merge refer to aligned branch artifacts.
A compact view of how planned work moves through execution, verification, critique, revision, approval, and merge.
This is the natural next instantiation of the tri-agent architecture for real engineering work.
stateDiagram-v2
[*] --> DraftTask
DraftTask --> Planned: Planner
Planned --> Executing: Executor starts
Executing --> ExecutionComplete: Executor finishes
ExecutionComplete --> Verifying: Verifier starts
Verifying --> ExecutionComplete: Verifier returns
ExecutionComplete --> NeedsRevision: Critic says revise
NeedsRevision --> Planned: Planner updates
ExecutionComplete --> Approved: Critic says approve
ExecutionComplete --> Rejected: Critic says reject
Approved --> Merged: merge with aligned artifacts
The architecture lives or dies on a few crisp invariants. Most bugs are stale-artifact or authority-boundary bugs rather than “math is hard” bugs.
The planner searches. The verifier constrains. The critic refines.
That separation is the heart of the system. The planner should feel pressure to be useful. The verifier should feel pressure to be correct. The critic should feel pressure to turn correctness pressure into better design.