Skip to content

Add Concrete Syntax for Unstructured Programs in Strata Core#1196

Open
PROgram52bc wants to merge 73 commits into
main2from
htd/unstructured-procedure
Open

Add Concrete Syntax for Unstructured Programs in Strata Core#1196
PROgram52bc wants to merge 73 commits into
main2from
htd/unstructured-procedure

Conversation

@PROgram52bc
Copy link
Copy Markdown
Contributor

Add Support for Unstructured Programs in Strata Core

Description of changes

  • Adds syntax and parser support for unstructured (CFG-based) procedure bodies in Strata Core. For example,
cfg entry { 
    // Deterministic block (default): 
    entry: {  
      x := 0; 
      if (x < n) goto entry else done;       
    } 
    done: {
      return; 
    }
  }
  
cfg entry {
    // Nondeterministic block:   
    entry: {   
      y := y + 1;     
      goto entry, done;   // nondeterministic choice 
    }
    done: {
      return; 
    }
  }

Prior to this PR, the only way to obtain unstructured programs was to apply StructuredToUnstructured transformation.

  • Adds .core.st file examples illustrating the syntax of unstructured programs.
  • Introduces Procedure.Body as a sum type:
  inductive Procedure.Body where
    | structured : List Statement → Procedure.Body
    | cfg : DetCFG → Procedure.Body
  • Adapts uses of Procedure.body to handle the cfg case based on the context
  • Adds metadata support to unstructured programs (new since initial PR draft)
  • Propagates loop invariants/decreases measure to unstructured CFG during transformation (new since initial PR draft)
  • Adds CFG-based Core-to-GOTO pipeline alongside direct path (new since initial PR draft)

Adaptation Methods for Procedure.body Uses

  1. CFG (Implemented) — Functions that handle both structured and CFG bodies with meaningful separate logic for each case.
  2. CFG (N/A) — Operations where CFGs legitimately contribute nothing (e.g., no local funcDecls, no structured loops), so returning [] or skipping is semantically correct.
  3. CFG (deferred) — Operations that currently error on CFGs or produce placeholder results, where proper CFG support is desirable but requires non-trivial additional work (e.g., dominator analysis, two-stage
    pipelines, graph-level inlining).
  4. Proof (Implemented) — Formal correctness proofs adapted to the Procedure.Body sum type.
  5. Proof (deferred) — Proof obligations where CFG correctness is not yet captured; the current formulation is trivially true for CFGs and is left for future PRs.

(Bolded items are updates since the initial PR draft)

Category Specific Strategy File Function/Location
Proof (Implemented) postconditionsValid adapted with CoreBodyExec Transform/CoreSpecification.lean ProcedureCorrect.postconditionsValid
Proves structured if transform succeeds Transform/ProcBodyVerifyCorrect.lean procToVerifyStmt_is_structured (new helper theorem)
Unifies two ss witnesses via subst Transform/ProcBodyVerifyCorrect.lean procBodyVerify_procedureCorrect
Adjusted case splits Languages/Core/ObligationExtraction.lean extractGo_ok proof
CFG (Implemented) Symbolic evaluation of CFG bodies using fuel measure, without path merging. Languages/Core/ProcedureEval.lean eval
Type-check CFG bodies (labels, vars, targets) implemented. Languages/Core/ProcedureType.lean typeCheck, checkModificationRights
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/GOTO/CoreToCProverGOTO.lean transformToGoto
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/GOTO/CoreToGOTOPipeline.lean procedureToGotoCtx
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG StrataTest/.../E2E_CoreToGOTO.lean coreToGotoJsonWithSummary
Alternative two-stage pipeline implemented by procedureToGotoCtxViaCFG Backends/CBMC/CoreToCBMC.lean createImplementationSymbolFromAST
extractCallsFromStatements vs extractCallsFromDetCFG Languages/Core/CallGraph.lean extractCallsFromProcedure
blockToCST vs detCFGToCST Languages/Core/DDMTransform/FormatCore.lean procToCST
runStmt vs runCFG Languages/Core/StatementEval.lean Command.runCall
Use CoreBodyExec with two constructors, and CoreStepStar vs CoreCFGStepStar Languages/Core/StatementSemantics.lean EvalCommand.call_sem
Lang.core vs Lang.coreCFG Transform/CoreSpecification.lean AssertValidInProcedure
transformStmts vs transformDetCFG Transform/PrecondElim.lean precondElim
eraseTypes, stripMetaData, getVars implemented and used for CFG Languages/Core/Procedure.lean eraseTypes, stripMetaData, getVars
extractFromStatements vs extractFromDetCFG Languages/Core/ObligationExtraction.lean extractObligations
CFG (N/A) return .none on CFG, skip inlining, because the inlined procedure could be structured or unstructured. Semantics is largely undefined. Transform/ProcedureInlining.lean inlineCallCmd
stmtsToCFG vs identity, latter is a no-op. StrataTest/.../Loops.lean singleCFG
Remove loops in structured, pass CFG through because it has no loops Transform/LoopElim.lean loopElim
[] for CFG (no local funcDecls in CFG bodies) Languages/Core/Core.lean buildEnv
CFG (deferred) Dominator-based path-condition propagation to be implemented Languages/Core/ObligationExtraction.lean extractFromDetCFG
throw on CFG, due to incompatible return type of List Statement instead of List Command Transform/CoreTransform.lean runProgram
throw on CFG, VC for CFG bodies to be implemented, proof to be adapted as well. Transform/ProcBodyVerify.lean procToVerifyStmt
Encode structured procedures, handling for CFG procedures to be implemented. Transform/ANFEncoder.lean anfEncodeProgram
CFG body renaming for inlining to be implemented Transform/ProcedureInlining.lean renameAllLocalNames
Call extraction for CFG bodies to be implemented StrataTest/Boole/global_readonly_call.lean callHelper
Proof (deferred) WF Properties wfstmts, wfloclnd, bodyExitsCovered conditioned on structured procedures, props for CFG to be implemented Languages/Core/WF.lean WFProcedureProp

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

PROgram52bc added 30 commits May 5, 2026 11:24
Address PR review comments: note that the GOTO backend could be
refactored into a two-stage pipeline (structured→cfg, then cfg→GOTO)
to eliminate the pattern matching on Procedure.Body.
- Add Body.getCfg accessor (mirrors getStructured)
- CoreToCBMC: throw error on CFG body instead of returning []
- StatementEval: interpret CFG bodies by linearizing blocks
- Grammar: add comment explaining why 'branch' is used instead of 'if'
  (DDM registers tokens globally, causing conflict with if-statement)
- WF.lean: replace .stmts with case-split premises (bodyIsStructured)
- Procedure.lean: replace isEmpty with isAbstract/isStructured/isCfg,
  remove Body.stmts definition
- ProcedureInlining test: remove unused getStmts helper
- Boole test: use explicit pattern matching with comment
- Update all .body.stmts usages to explicit match expressions
- Fix ProcBodyVerifyCorrect proof to use new WF structure
@PROgram52bc PROgram52bc changed the title Add Support for Unstructured Programs in Strata Core Add Concrete Syntax for Unstructured Programs in Strata Core May 20, 2026
@PROgram52bc PROgram52bc marked this pull request as ready for review May 20, 2026 21:13
@PROgram52bc PROgram52bc requested a review from a team as a code owner May 20, 2026 21:13
Resolve conflicts in:
- Strata/Languages/Boole/Verify.lean: drop HEAD's registerCommandSymbols/initFVarIsOp
  in favor of main2's globalVarTypes-based getFVarIsOp.
- Strata/Languages/Core/ObligationExtraction.lean: combine HEAD's extractFromCmd
  refactor with main2's convertMetaDataPropertyType helper.
- Strata/Languages/Core/SMTEncoder.lean: drop `private` on lMonoTyToTermType to
  expose for Verifier.lean cross-module use.
- Strata/Languages/Core/Verifier.lean: take main2's encodeDeclarationsAbstract
  signature (it gained per-phase pctx tracking) and pass pctx through encodeCore.
- Strata/Transform/ProcBodyVerifyCorrect.lean: keep HEAD's `ss` from destructuring
  proc.body = .structured ss (main2 still treated proc.body as a list directly).
- Strata/Transform/StructuredToUnstructured.lean: keep HEAD's loop contract metadata
  on .condGoto / .goto transfers (main2's transfer ctors lacked the md argument).
- StrataTest/DL/Imperative/StepStmtTest.lean: keep HEAD's explicit Bool.or_false simp.
- StrataTest/DL/SMT/TranslateTests.lean: take main2's added arity-error tests.
- StrataTest/Languages/Core/Examples/Seq.lean: take main2's added bounds-check
  obligation expectations.
@github-actions github-actions Bot removed dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code SMT Git conflicts labels May 26, 2026
@PROgram52bc PROgram52bc enabled auto-merge May 26, 2026 23:49
Copy link
Copy Markdown
Contributor

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly looks great, but it's big enough that my list of comments and small requests got kind of long. :)

Comment thread Examples/CFGSimple.core.st Outdated
Comment thread Strata/DL/Imperative/BasicBlock.lean Outdated
Comment thread Strata/Languages/Boole/Verify.lean Outdated
Comment thread Strata/Languages/Core/StatementEval.lean Outdated
Comment thread Strata/Languages/Core/StatementEval.lean
Comment thread Strata/Backends/CBMC/GOTO/CoreCFGToGOTOPipeline.lean
Comment thread Strata/Languages/Core/ProcedureType.lean
Comment thread Strata/Languages/Core/StatementSemantics.lean Outdated
Comment thread Strata/Languages/Core/DDMTransform/FormatCore.lean Outdated
Comment thread StrataTest/Languages/Core/Tests/NestedInductiveRestriction.lean Outdated
Changes from review:
- BasicBlock.lean: drop `:= .empty` default on `DetTransferCmd.goto`; cascade
  explicit `synthesizedMd` provenance through `StructuredToUnstructured.lean`
  (`detCmdBlock`, `flushCmds`, the loop measure-decrease block) and update
  `ProcedureEvalCFGTests.lean` callers to pass `.empty` explicitly. Loops/Exit
  golden outputs gain the synthesized provenance prefix on transfers that
  previously appeared without metadata.
- Grammar.lean: change `transfer_cond_goto` syntax to `branch (cond) goto X
  else goto Y` (symmetric); update `CFGSimple.core.st` and `CFGParseTests.lean`
  to match.
- FormatCore.lean: split `transferToCST` into deterministic-only and a new
  `nondetTransferToCST` for `NondetTransferCmd`; drop the `$__nondet_` fvar
  sniff in the Det path. Det path always emits `transfer_goto`/`transfer_cond_goto`.
- StatementEval.lean: replace stale `runCFG` docstring (it was describing
  `runCall`); restore `runCall` docstring; reword the runCFG nondet branch
  error message to point at symbolic / nondeterministic-goto conditions.
- StatementSemantics.lean: fix `CoreCFGStepStar` docstring (`_enableNesting`
  → `cmd`) and drop the dead reference to the deleted
  `NestedInductiveRestriction.lean`.
- CoreToGOTOPipeline.lean: add brief docstrings to the now-public
  `renameIdent` / `renameExpr` / `renameCmd`.
- Boole/Verify.lean: typo (`Boole dialect` → `the Boole dialect`).
- E2E_CoreToGOTO.lean: rewrite stale TODO; this helper uses the direct path
  because `injectPropertySummary` pattern-matches on `Core.Statement`.
- NestedInductiveRestriction.lean: deleted (per reviewer: tests should pin
  desired behavior, not unwanted restrictions).

Deferred to follow-up PRs (per discussion): O(n^2) list-append rewrite in
`typeCheckCFG`; dataflow-aware type checking across CFG blocks;
`coreCFGToGotoTransform` ↔ `detCFGToGotoTransform` deduplication (revisit
once the two-stage pipeline is stable on `htd/smack`).
…back

Project policy disallows new panic! calls. The unrepresentable >2-target
case now logs an error via the existing ToCSTM error pattern and falls
back to transfer_return, matching how lconstToExpr/lopToExpr handle
unknown-construct cases.
Copy link
Copy Markdown
Contributor Author

@PROgram52bc PROgram52bc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reading through such as tedious PR :) I addressed the comments you gave. Please let me know if there's anything you'd like to see differently.

Comment thread Examples/CFGSimple.core.st Outdated
Comment thread Strata/DL/Imperative/BasicBlock.lean Outdated
Comment thread Strata/Languages/Boole/Verify.lean Outdated
Comment thread Strata/Languages/Core/StatementEval.lean Outdated
Comment thread Strata/Languages/Core/StatementEval.lean
Comment thread Strata/Backends/CBMC/GOTO/CoreCFGToGOTOPipeline.lean
Comment thread Strata/Languages/Core/ProcedureType.lean
Comment thread Strata/Languages/Core/StatementSemantics.lean Outdated
Comment thread Strata/Languages/Core/DDMTransform/FormatCore.lean Outdated
Comment thread StrataTest/Languages/Core/Tests/NestedInductiveRestriction.lean Outdated
@PROgram52bc PROgram52bc requested review from a team and atomb May 29, 2026 20:12
Two tests merged from main2 used the pre-#1196 `proc.body : List Statement`
shape. Adapt them to the new `Procedure.Body` sum (`structured` / `cfg`):

- BvIntCastVerifyTests.lean (added by #1217): wrap the literal body list
  with `.structured`.
- Boole/FeatureRequests/seq_empty_literal.lean (added by #1214): pattern-
  match `proc.body` and iterate the `.structured` arm; skip `.cfg` since
  the Boole-to-Core lowering does not produce CFG bodies.
atomb
atomb previously approved these changes May 29, 2026
Copy link
Copy Markdown
Contributor

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to continue propagating the change to remove default metadata, but that could be in a separate PR. Otherwise I think this looks great.

@PROgram52bc
Copy link
Copy Markdown
Contributor Author

I would like to continue propagating the change to remove default metadata, but that could be in a separate PR. Otherwise I think this looks great.

Thanks!

…r keywords intact

`GenSyntax.lean` extracts each string literal in an op's syntax def as a
single keyword, so the previous `" else goto "` was being collapsed to
`elsegoto` and added to the editor syntax-highlighting files (vscode +
emacs). CI's `git diff --exit-code editors/` step then failed because the
generated files drifted from the committed ones.

Splitting the literal into `" else " "goto "` keeps `else` and `goto` as
the two existing keywords. Parser behavior is unchanged: DDM tokenizes
both forms identically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants