Refactoring Coding Theory by alexanderlhicks · Pull Request #279 · Verified-zkEVM/ArkLib

alexanderlhicks · 2025-12-17T23:23:24Z

Opening up this PR/branch to refactor and clean up some of the contents of our coding theory formalization.

Initial commits:

Moved contents of STIR and WHIR into CodingTheory, fixed what seemed to be some issues in STIR along the way.
Refactored ProximityGaps slightly, renaming files in terms of their contents rather than the reference paper authors, and moving the contents of AHIV22/Ligero into the other two (now ReedSolomon and Interleaved) for greater clarity if new results from more recent papers are formalized.

…n stir_main theorem

…nts of AHIV22 into these

github-actions · 2025-12-17T23:24:16Z

🤖 Gemini PR Summary

This diff introduces a major refactoring and cleanup of the codebase, focusing on better organization and removal of obsolete components.

Here is a summary of the key changes:

Code Reorganization (ProofSystem -> Data):
- A significant amount of code related to coding theory has been moved from specific proof system implementations (like Stir and Whir) into a more general and reusable location under ArkLib/Data/CodingTheory/.
- This includes modules for folding, combining, quotienting, proximity bounds, and distance metrics, making the codebase more modular.
Removal of FRIBinius and RingSwitching:
- The entire FRIBinius and RingSwitching components of the Binius proof system have been deleted. This represents a substantial removal of functionality.
Consolidation of Proximity Gap Lemmas:
- Files related to proximity gaps for codes have been renamed and consolidated. Instead of being organized by the papers they originate from (e.g., AHIV22, BCIKS20), they are now structured by topic (e.g., ReedSolomon, Interleaved).
Dependency and Helper Cleanup:
- Helper code for SelectableType instances has been removed from ArkLib/ToVCVio/Oracle.lean. Correspondingly, some instances in the Binius code have been temporarily stubbed out with sorry.
Minor Fixes and Tooling Updates:
- A logical correction and clarification was made to the STIR folding implementation.
- The development scripts were updated to use a newer AI model for generating summaries and reviews.

Analysis of Changes

Metric	Count
📝 Files Changed	34
✅ Lines Added	159
❌ Lines Removed	3065

sorry Tracking

✅ Removed: 21 sorry(s)
- theorem finalSumcheckOracleReduction_perfectCompleteness {σ : Type} in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
- theorem finalSumcheckOracleVerifier_rbrKnowledgeSoundness [Fintype L] {σ : Type} in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- theorem finalSumcheckOracleVerifier_rbrKnowledgeSoundness [Fintype L] {σ : Type} in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
- theorem iteratedSumcheckOracleReduction_perfectCompleteness (hInit : init.neverFails) (i : Fin ℓ') : in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
- theorem coreInteraction_rbrKnowledgeSoundness: in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- instance sumcheckFoldCtxLens_complete : in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
- lemma proximity_gap in ArkLib/ProofSystem/Stir/ProximityGap.lean
- theorem batchingOracleVerifier_rbrKnowledgeSoundness : in ArkLib/ProofSystem/Binius/RingSwitching/BatchingPhase.lean
- def batchingKStateProp {m : Fin (2 + 1)} in ArkLib/ProofSystem/Binius/RingSwitching/BatchingPhase.lean
- instance sumcheckFoldExtractorLens_rbr_knowledge_soundness : in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
- def finalSumcheckKStateProp {m : Fin (1 + 1)} (tr : Transcript m (pSpecFinalSumcheck L)) in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
- theorem batchingReduction_perfectCompleteness (hInit : init.neverFails) : in ArkLib/ProofSystem/Binius/RingSwitching/BatchingPhase.lean
- theorem fullOracleVerifier_rbrKnowledgeSoundness {𝓑 : Fin 2 ↪ L} : in ArkLib/ProofSystem/Binius/RingSwitching/General.lean
- theorem sumcheckFoldOracleVerifier_rbrKnowledgeSoundness [Fintype L] : in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
- theorem iteratedSumcheckOracleVerifier_rbrKnowledgeSoundness (i : Fin ℓ') : in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- theorem finalSumcheckOracleReduction_perfectCompleteness {σ : Type} in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- def iteratedSumcheckKnowledgeStateFunction (i : Fin ℓ') : in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
- def finalSumcheckKStateProp {m : Fin (1 + 1)} (tr : Transcript m (pSpecFinalSumcheckStep (L in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
- lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
❌ Added: 6 sorry(s)
- lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
- lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
- lemma min_dist_le_d [Field F] {B : Finset (Fin n → F)} (v : Fin n → F) in ArkLib/Data/CodingTheory/JohnsonBound/Expectations.lean
- lemma e_ball_le_radius [Field F] [Fintype F] {B : Finset (Fin n → F)} (v : Fin n → F) (r : ℚ) in ArkLib/Data/CodingTheory/JohnsonBound/Expectations.lean
- instance : ∀ i, SelectableType (↑(sDomain 𝔽q β h_ℓ_add_R_rate i)) in ArkLib/ProofSystem/Binius/BinaryBasefold/Spec.lean
- lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean

Last updated: 2025-12-18 01:53 UTC. See the main CI run for build status.

github-actions · 2025-12-28T00:45:06Z

🤖 Gemini PR Summary

Here is the high-level summary of the changes:

Refactoring

Module Restructuring: Centralized coding theory components (Proximity Gaps, Folding, OutOfDomainSampling) from specific proof system namespaces (ProofSystem.Stir, ProofSystem.Whir) into the general ArkLib.Data.CodingTheory namespace.
File Renaming & consolidation:
- Renamed Proximity Gap files to describe their contents rather than reference paper authors (e.g., BCIKS20 $\to$ ReedSolomon, AHIV22 $\to$ merged into Interleaved/ReedSolomon).
- Renamed BlockRelDistance.lean to BlockRel.lean.
- Renamed MainThm.lean to RBRSoundness.lean in the STIR proof system.
Cleanup: Removed redundant files (ProofSystem/Stir/ProximityGap.lean, Data/CodingTheory/ProximityGap/AHIV22.lean) and updated imports globally to reflect the new directory structure.

Fixes

STIR Logic:
- Corrected the field size constraint in RBRSoundness.lean (changed from an upper bound to a lower bound).
- Swapped variable indices in Folding/Stir.lean so the zeroth variable correctly maps to randomness and the first to the polynomial indeterminate.
Operations: Corrected the exponent calculation within the ri function in Combine.lean.

Features

Proximity Gaps:
- Integrated Lemma 4.3 from [AHIV22] (distance lower bounds for interleaved linear codes) into Interleaved.lean.
- Added combinatorial proximity results from [AHIV22] regarding Reed-Solomon codes and affine lines to ReedSolomon.lean.

Documentation

Quotienting: Corrected comments regarding the degree of the vanishing polynomial in Operations/Quotienting.lean.

Analysis of Changes

Metric	Count
📝 Files Changed	20
✅ Lines Added	152
❌ Lines Removed	197

sorry Tracking

✅ **Removed:** 4 `sorry`(s)

lemma proximity_gap in ArkLib/ProofSystem/Stir/ProximityGap.lean
lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean

❌ **Added:** 3 `sorry`(s)

lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean
lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean

🎨 **Style Guide Adherence**

Based on the provided style guide and code changes, here are the violations found:

File: ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
Line: lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F}
Rule: "Please follow the mathlib Style Guide. This covers naming conventions..." (Theorem and lemma names should be lowerCamelCase, not snake_case).

File: ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
Line: /-- **Lemma 4.4, [AHIV22] (Combinatorial proximity gap for affine lines)**
Rule: "Each file that cites papers should have a ## References section in its docstring header... When adding a new paper ... list it in the References section." (The file cites [AHIV22], but the diff does not show the addition of this citation to the file's References section).

File: ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean
Line: ## Proximity results from [AHIV22] (Ligero)
Rule: "Each file that cites papers should have a ## References section in its docstring header... When adding a new paper ... list it in the References section." (The file cites [AHIV22], but the diff does not show the addition of this citation to the file's References section).

📄 **Per-File Summaries**

ArkLib.lean: The file updates imports to reflect a refactoring that moves coding theory components from the Stir and Whir proof system modules into the general ArkLib.Data.CodingTheory namespace.
ArkLib/Data/CodingTheory/Distance/BlockRel.lean: The file BlockRelDistance.lean is renamed to BlockRel.lean and moved to the ArkLib/Data/CodingTheory/Distance directory, along with minor formatting adjustments.
ArkLib/Data/CodingTheory/DivergenceOfSets.lean: The import for the proximity gap module is updated from BCIKS20 to ReedSolomon.
ArkLib/Data/CodingTheory/Folding/Stir.lean: The file was moved to the ArkLib/Data/CodingTheory module, and the variable indices in the STIR folding logic were swapped so that the zeroth variable maps to the randomness and the first variable maps to the polynomial indeterminate.
ArkLib/Data/CodingTheory/Folding/Whir.lean: The Whir folding implementation was relocated from the ProofSystem module to Data/CodingTheory, and its imports were updated to match the new structure.
ArkLib/Data/CodingTheory/Operations/Combine.lean: This commit moves the file to ArkLib/Data/CodingTheory/Operations and corrects the exponent calculation within the ri function.
ArkLib/Data/CodingTheory/Operations/Quotienting.lean: Relocate the file to the coding theory operations module and correct the documentation regarding the degree of the vanishing polynomial.
ArkLib/Data/CodingTheory/OutOfDomSmpl/Stir.lean: Relocate and rename ArkLib/ProofSystem/Stir/OutOfDomSmpl.lean to ArkLib/Data/CodingTheory/OutOfDomSmpl/Stir.lean.
ArkLib/Data/CodingTheory/OutOfDomSmpl/Whir.lean: The file OutofDomainSmpl.lean is moved from ArkLib/ProofSystem/Whir to ArkLib/Data/CodingTheory/OutOfDomSmpl/Whir.lean without any content changes.
ArkLib/Data/CodingTheory/Proximity/Bound.lean: The file ProximityBound.lean has been moved from the ProofSystem/Stir directory to Data/CodingTheory/Proximity and renamed to Bound.lean.
ArkLib/Data/CodingTheory/Proximity/Generator.lean: The file ArkLib/ProofSystem/Whir/ProximityGen.lean was moved and renamed to ArkLib/Data/CodingTheory/Proximity/Generator.lean.
ArkLib/Data/CodingTheory/Proximity/MutualCorrAgreement.lean: The file was relocated from ProofSystem/Whir to Data/CodingTheory/Proximity and its import of the proximity generator was updated to reflect the new structure.
ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean: Deletes the file ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean which contained definitions and lemmas related to proximity gaps for Reed-Solomon codes.
ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean: The file is renamed to Interleaved.lean and updated to include the statement of Lemma 4.3 from [AHIV22] regarding distance lower bounds for interleaved linear codes.
ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean: The file BCIKS20.lean is renamed to ReedSolomon.lean and updated to include combinatorial proximity results from [AHIV22] concerning Reed-Solomon codes and affine lines.
ArkLib/ProofSystem/Stir.lean: The import paths in ArkLib/ProofSystem/Stir.lean were updated to reflect the refactoring of Stir-related modules into the ArkLib.Data.CodingTheory namespace.
ArkLib/ProofSystem/Stir/ProximityGap.lean: This change removes the ArkLib/ProofSystem/Stir/ProximityGap.lean file, which contained the definition of the proximity_gap lemma for Reed-Solomon codes within the STIR proof system.
ArkLib/ProofSystem/Stir/RBRSoundness.lean: The file is renamed from MainThm.lean to RBRSoundness.lean with updated imports, and the field size constraint in the STIR main theorem is corrected from an upper bound to a lower bound.
ArkLib/ProofSystem/Whir.lean: The changes update import statements to reflect the relocation of various Whir-related modules to the ArkLib.Data.CodingTheory namespace.
ArkLib/ProofSystem/Whir/RBRSoundness.lean: The imports in RBRSoundness.lean are updated to reflect the relocation of block relative distance and proximity modules to the ArkLib.Data.CodingTheory namespace.

Last updated: 2025-12-28 00:53 UTC.

alexanderlhicks · 2026-01-19T14:10:31Z

@chung-thai-nguyen since you had done some refactoring already in a previous: does the organisation done in this refactoring seem fine to you? I've tried to mostly stick with your outline but made some things a bit more generic where we overlapped (e.g. generic filenames instead of explicit references to papers now that several files have content from multiple papers, ...)

chung-thai-nguyen · 2026-01-19T14:45:01Z

@chung-thai-nguyen since you had done some refactoring already in a previous: does the organisation done in this refactoring seem fine to you? I've tried to mostly stick with your outline but made some things a bit more generic where we overlapped (e.g. generic filenames instead of explicit references to papers now that several files have content from multiple papers, ...)

@alexanderlhicks The generic file names (in ProximityGap folder) looks reasonable if we want a centralized place for readers to find the proximity gap theorems available for usage (but likely not the definitions since they are already in ProximityGap/Basic.lean), e.g. in OracleReduction protocols. My main concern was due to the bloat of general files.

For Interleaved.lean, most (if not all) proximity gap definitions/theorems can be written from the interleaved code POV. Similarly for ReedSolomon.lean, a lot of papers state proximity gaps for RS codes. 1 theorem might be put in both Interleaved.lean & ReedSolomon.lean if we don't have clear distinction in terms of purpose of the files. I think we can have these generic files but only for theorem statements across papers, and the helper lemmas should be put somewhere else if we really decide to have monolithic files like that in the long term. If we don't split helper lemmas, the files would look pretty long & distracted (e.g. the DG25 file is 2k lines and it is only about proximity gaps for multilinear combination, but its main theorems are short). But idk how should we do it atm. What do you think?

alexanderlhicks · 2026-01-19T16:23:39Z

For Interleaved.lean, most (if not all) proximity gap definitions/theorems can be written from the interleaved code POV. Similarly for ReedSolomon.lean, a lot of papers state proximity gaps for RS codes. 1 theorem might be put in both Interleaved.lean & ReedSolomon.lean if we don't have clear distinction in terms of purpose of the files

Yep, beyond what's been done on this PR now (which is really just moving stuff across, particularly the STIR/WHIR contents), I think a next step (in this PR or a follow up) will be to see where we can clean up and unify results (STIR and WHIR alone have/had overlap given parts of the WHIR paper are literally copy pasted from the STIR paper), but this first requires setting up the file structure so that we know what boxes we to put things in.

If we don't split helper lemmas, the files would look pretty long & distracted (e.g. the DG25 file is 2k lines and it is only about proximity gaps for multilinear combination, but its main theorems are short). But idk how should we do it atm. What do you think?

Yep, I agree about the bloat. One option is to have small files for key results and split auxilliary/helpful lemmas across other files according to what they cover. See for example the edits I made to the proof of Polishchuk Spielman here: https://github.com/alexanderlhicks/ArkLib/pull/1/files, which keeps the proof of the main result in PolishchukSpielman.lean and splits intermnediary results across three files based on what they cover (if what they cover is not easily categorized, they can just be split into Aux.lean or Aux1.lean, Aux2.lean if they are too long...).
This makes it easy to browse important theorems whilst ignoring intermeidary results, and speeds up build times if working on any individual file.

chung-thai-nguyen · 2026-01-20T03:39:28Z

@alexanderlhicks Yes, we really need to figure out the right file structure. Here I give some suggestions for your consideration:
For proximity gaps:

I think proximity gaps/generators, Mutual/Weighted CA definitions (& their very fundamental results) can be put into same place as CA (in ProximityGap/Basic.lean), as we did for CodingTheory/Basic.lean, because these things are core upstream definitions. These are what most worth generalizing.
Then comes the instantiations/specific results of these defs: we can categorize results (main theorems only) for generic LinearCode vs ReedSolomon code since they are most of the cases for now.
For Interleaved.lean, the only theorem fits into the category of proximity gap for interleaved code (we see the concept of interleaved word of interleaved symbols), is Theorem 3.1 of DG25 paper, the AHIV22 lemmas don't fit (it's just interleaved word of generic (not-necessarily-interleaved) symbol, as in BCIKS20). We can put DG25 theorems into LinearCode file & ReedSolomon file of ProximityGap accordingly so maybe we shouldn't have this file.
Universal tools like Polishchuk-Spielman can have their own folder/files as you did (https://github.com/alexanderlhicks/ArkLib/pull/1/changes) since it's well-known, while some paper-specific techniques/lemmas that are not widely reusable can be put into Aux files like you said (e.g. a big Aux section with the paper name so we don't confuse ourselves later).

For other tools like OOD, folding, ... we can continue as you do and I think we can unify sth out of different formulations of them (FRI/STIR/WHIR/Binius, etc).

alexanderlhicks added 11 commits December 17, 2025 17:57

move stir/whir coding theory material to data/codingtheory

d056972

rename rbr soundness file in stir

8744e46

edits to stir folding

b3db97b

tweak folding degree bounds in stir

fb2eea8

ri exponent in Combine

fd405a1

quotienting docsting typo

68d45c7

stir proximity gap

5da28ad

rm stir proximity gaps file (redundant) + fix field size inequality i…

735fb55

…n stir_main theorem

rework CodingTheory/ProximityGaps: rename DG25 and BCIKS, merge conte…

8d71c76

…nts of AHIV22 into these

refactor: update imports after ProximityGap files rename

cf6209d

carry over authors from AHIV22 to Interleaved proximity gaps

752d3ec

fix import/build error

f61be23

alexanderlhicks marked this pull request as draft December 18, 2025 00:27

alexanderlhicks added 11 commits December 18, 2025 00:48

fix build issue/import

5bb90bf

fix imports, fix AHIV22 lemmas in ProximityGaps/ReedSolomon

9832286

fix build error in proximitygaps/interleaved

16ed6b3

churn

35218d4

dummy pr to trigger & test newly refactored workflows

f99874f

Merge remote-tracking branch 'origin/main' into codingtheory-refactor

608656f

dummy change to trigger summary

a737d5d

dummy change to trigger summary

328356f

dummy change to trigger summary

53545f2

dummy change to trigger summary

fbdc717

dummy change to trigger summary

b573f27

dummy change to trigger summary

0693880

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring Coding Theory#279

Refactoring Coding Theory#279
alexanderlhicks wants to merge 24 commits intomainfrom
codingtheory-refactor

alexanderlhicks commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 28, 2025 •

edited

Loading

Uh oh!

alexanderlhicks commented Jan 19, 2026

Uh oh!

chung-thai-nguyen commented Jan 19, 2026

Uh oh!

alexanderlhicks commented Jan 19, 2026 •

edited

Loading

Uh oh!

chung-thai-nguyen commented Jan 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

alexanderlhicks commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Gemini PR Summary

Uh oh!

github-actions bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Gemini PR Summary

Refactoring

Fixes

Features

Documentation

Uh oh!

alexanderlhicks commented Jan 19, 2026

Uh oh!

chung-thai-nguyen commented Jan 19, 2026

Uh oh!

alexanderlhicks commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chung-thai-nguyen commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

github-actions bot commented Dec 17, 2025 •

edited

Loading

github-actions bot commented Dec 28, 2025 •

edited

Loading

alexanderlhicks commented Jan 19, 2026 •

edited

Loading

chung-thai-nguyen commented Jan 20, 2026 •

edited

Loading