Skip to content

Refactoring Coding Theory#279

Draft
alexanderlhicks wants to merge 24 commits intomainfrom
codingtheory-refactor
Draft

Refactoring Coding Theory#279
alexanderlhicks wants to merge 24 commits intomainfrom
codingtheory-refactor

Conversation

@alexanderlhicks
Copy link
Collaborator

Opening up this PR/branch to refactor and clean up some of the contents of our coding theory formalization.

Initial commits:

  • Moved contents of STIR and WHIR into CodingTheory, fixed what seemed to be some issues in STIR along the way.
  • Refactored ProximityGaps slightly, renaming files in terms of their contents rather than the reference paper authors, and moving the contents of AHIV22/Ligero into the other two (now ReedSolomon and Interleaved) for greater clarity if new results from more recent papers are formalized.

@github-actions
Copy link

github-actions bot commented Dec 17, 2025

🤖 Gemini PR Summary

This diff introduces a major refactoring and cleanup of the codebase, focusing on better organization and removal of obsolete components.

Here is a summary of the key changes:

  • Code Reorganization (ProofSystem -> Data):

    • A significant amount of code related to coding theory has been moved from specific proof system implementations (like Stir and Whir) into a more general and reusable location under ArkLib/Data/CodingTheory/.
    • This includes modules for folding, combining, quotienting, proximity bounds, and distance metrics, making the codebase more modular.
  • Removal of FRIBinius and RingSwitching:

    • The entire FRIBinius and RingSwitching components of the Binius proof system have been deleted. This represents a substantial removal of functionality.
  • Consolidation of Proximity Gap Lemmas:

    • Files related to proximity gaps for codes have been renamed and consolidated. Instead of being organized by the papers they originate from (e.g., AHIV22, BCIKS20), they are now structured by topic (e.g., ReedSolomon, Interleaved).
  • Dependency and Helper Cleanup:

    • Helper code for SelectableType instances has been removed from ArkLib/ToVCVio/Oracle.lean. Correspondingly, some instances in the Binius code have been temporarily stubbed out with sorry.
  • Minor Fixes and Tooling Updates:

    • A logical correction and clarification was made to the STIR folding implementation.
    • The development scripts were updated to use a newer AI model for generating summaries and reviews.

Analysis of Changes

Metric Count
📝 Files Changed 34
Lines Added 159
Lines Removed 3065

sorry Tracking

  • Removed: 21 sorry(s)
    • theorem finalSumcheckOracleReduction_perfectCompleteness {σ : Type} in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
    • theorem finalSumcheckOracleVerifier_rbrKnowledgeSoundness [Fintype L] {σ : Type} in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • theorem finalSumcheckOracleVerifier_rbrKnowledgeSoundness [Fintype L] {σ : Type} in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
    • theorem iteratedSumcheckOracleReduction_perfectCompleteness (hInit : init.neverFails) (i : Fin ℓ') : in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
    • theorem coreInteraction_rbrKnowledgeSoundness: in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • instance sumcheckFoldCtxLens_complete : in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
    • lemma proximity_gap in ArkLib/ProofSystem/Stir/ProximityGap.lean
    • theorem batchingOracleVerifier_rbrKnowledgeSoundness : in ArkLib/ProofSystem/Binius/RingSwitching/BatchingPhase.lean
    • def batchingKStateProp {m : Fin (2 + 1)} in ArkLib/ProofSystem/Binius/RingSwitching/BatchingPhase.lean
    • instance sumcheckFoldExtractorLens_rbr_knowledge_soundness : in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
    • def finalSumcheckKStateProp {m : Fin (1 + 1)} (tr : Transcript m (pSpecFinalSumcheck L)) in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
    • theorem batchingReduction_perfectCompleteness (hInit : init.neverFails) : in ArkLib/ProofSystem/Binius/RingSwitching/BatchingPhase.lean
    • theorem fullOracleVerifier_rbrKnowledgeSoundness {𝓑 : Fin 2 ↪ L} : in ArkLib/ProofSystem/Binius/RingSwitching/General.lean
    • theorem sumcheckFoldOracleVerifier_rbrKnowledgeSoundness [Fintype L] : in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
    • theorem iteratedSumcheckOracleVerifier_rbrKnowledgeSoundness (i : Fin ℓ') : in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • theorem finalSumcheckOracleReduction_perfectCompleteness {σ : Type} in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • def iteratedSumcheckKnowledgeStateFunction (i : Fin ℓ') : in ArkLib/ProofSystem/Binius/RingSwitching/SumcheckPhase.lean
    • def finalSumcheckKStateProp {m : Fin (1 + 1)} (tr : Transcript m (pSpecFinalSumcheckStep (L in ArkLib/ProofSystem/Binius/FRIBinius/CoreInteractionPhase.lean
    • lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
  • Added: 6 sorry(s)
    • lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
    • lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
    • lemma min_dist_le_d [Field F] {B : Finset (Fin n → F)} (v : Fin n → F) in ArkLib/Data/CodingTheory/JohnsonBound/Expectations.lean
    • lemma e_ball_le_radius [Field F] [Fintype F] {B : Finset (Fin n → F)} (v : Fin n → F) (r : ℚ) in ArkLib/Data/CodingTheory/JohnsonBound/Expectations.lean
    • instance : ∀ i, SelectableType (↑(sDomain 𝔽q β h_ℓ_add_R_rate i)) in ArkLib/ProofSystem/Binius/BinaryBasefold/Spec.lean
    • lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean

Last updated: 2025-12-18 01:53 UTC. See the main CI run for build status.

@alexanderlhicks alexanderlhicks marked this pull request as draft December 18, 2025 00:27
@github-actions
Copy link

github-actions bot commented Dec 28, 2025

🤖 Gemini PR Summary

Here is the high-level summary of the changes:

Refactoring

  • Module Restructuring: Centralized coding theory components (Proximity Gaps, Folding, OutOfDomainSampling) from specific proof system namespaces (ProofSystem.Stir, ProofSystem.Whir) into the general ArkLib.Data.CodingTheory namespace.
  • File Renaming & consolidation:
    • Renamed Proximity Gap files to describe their contents rather than reference paper authors (e.g., BCIKS20 $\to$ ReedSolomon, AHIV22 $\to$ merged into Interleaved/ReedSolomon).
    • Renamed BlockRelDistance.lean to BlockRel.lean.
    • Renamed MainThm.lean to RBRSoundness.lean in the STIR proof system.
  • Cleanup: Removed redundant files (ProofSystem/Stir/ProximityGap.lean, Data/CodingTheory/ProximityGap/AHIV22.lean) and updated imports globally to reflect the new directory structure.

Fixes

  • STIR Logic:
    • Corrected the field size constraint in RBRSoundness.lean (changed from an upper bound to a lower bound).
    • Swapped variable indices in Folding/Stir.lean so the zeroth variable correctly maps to randomness and the first to the polynomial indeterminate.
  • Operations: Corrected the exponent calculation within the ri function in Combine.lean.

Features

  • Proximity Gaps:
    • Integrated Lemma 4.3 from [AHIV22] (distance lower bounds for interleaved linear codes) into Interleaved.lean.
    • Added combinatorial proximity results from [AHIV22] regarding Reed-Solomon codes and affine lines to ReedSolomon.lean.

Documentation

  • Quotienting: Corrected comments regarding the degree of the vanishing polynomial in Operations/Quotienting.lean.

Analysis of Changes

Metric Count
📝 Files Changed 20
Lines Added 152
Lines Removed 197

sorry Tracking

✅ **Removed:** 4 `sorry`(s)
  • lemma proximity_gap in ArkLib/ProofSystem/Stir/ProximityGap.lean
  • lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
  • lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
  • lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
❌ **Added:** 3 `sorry`(s)
  • lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean
  • lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
  • lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F} in ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean

🎨 **Style Guide Adherence**

Based on the provided style guide and code changes, here are the violations found:

File: ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
Line: lemma e_leq_dist_over_3 [DecidableEq F] {deg : ℕ} {α : ι ↪ F} {e : ℕ} {u v : ι → F}
Rule: "Please follow the mathlib Style Guide. This covers naming conventions..." (Theorem and lemma names should be lowerCamelCase, not snake_case).

File: ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean
Line: /-- **Lemma 4.4, [AHIV22] (Combinatorial proximity gap for affine lines)**
Rule: "Each file that cites papers should have a ## References section in its docstring header... When adding a new paper ... list it in the References section." (The file cites [AHIV22], but the diff does not show the addition of this citation to the file's References section).

File: ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean
Line: ## Proximity results from [AHIV22] (Ligero)
Rule: "Each file that cites papers should have a ## References section in its docstring header... When adding a new paper ... list it in the References section." (The file cites [AHIV22], but the diff does not show the addition of this citation to the file's References section).


📄 **Per-File Summaries**
  • ArkLib.lean: The file updates imports to reflect a refactoring that moves coding theory components from the Stir and Whir proof system modules into the general ArkLib.Data.CodingTheory namespace.
  • ArkLib/Data/CodingTheory/Distance/BlockRel.lean: The file BlockRelDistance.lean is renamed to BlockRel.lean and moved to the ArkLib/Data/CodingTheory/Distance directory, along with minor formatting adjustments.
  • ArkLib/Data/CodingTheory/DivergenceOfSets.lean: The import for the proximity gap module is updated from BCIKS20 to ReedSolomon.
  • ArkLib/Data/CodingTheory/Folding/Stir.lean: The file was moved to the ArkLib/Data/CodingTheory module, and the variable indices in the STIR folding logic were swapped so that the zeroth variable maps to the randomness and the first variable maps to the polynomial indeterminate.
  • ArkLib/Data/CodingTheory/Folding/Whir.lean: The Whir folding implementation was relocated from the ProofSystem module to Data/CodingTheory, and its imports were updated to match the new structure.
  • ArkLib/Data/CodingTheory/Operations/Combine.lean: This commit moves the file to ArkLib/Data/CodingTheory/Operations and corrects the exponent calculation within the ri function.
  • ArkLib/Data/CodingTheory/Operations/Quotienting.lean: Relocate the file to the coding theory operations module and correct the documentation regarding the degree of the vanishing polynomial.
  • ArkLib/Data/CodingTheory/OutOfDomSmpl/Stir.lean: Relocate and rename ArkLib/ProofSystem/Stir/OutOfDomSmpl.lean to ArkLib/Data/CodingTheory/OutOfDomSmpl/Stir.lean.
  • ArkLib/Data/CodingTheory/OutOfDomSmpl/Whir.lean: The file OutofDomainSmpl.lean is moved from ArkLib/ProofSystem/Whir to ArkLib/Data/CodingTheory/OutOfDomSmpl/Whir.lean without any content changes.
  • ArkLib/Data/CodingTheory/Proximity/Bound.lean: The file ProximityBound.lean has been moved from the ProofSystem/Stir directory to Data/CodingTheory/Proximity and renamed to Bound.lean.
  • ArkLib/Data/CodingTheory/Proximity/Generator.lean: The file ArkLib/ProofSystem/Whir/ProximityGen.lean was moved and renamed to ArkLib/Data/CodingTheory/Proximity/Generator.lean.
  • ArkLib/Data/CodingTheory/Proximity/MutualCorrAgreement.lean: The file was relocated from ProofSystem/Whir to Data/CodingTheory/Proximity and its import of the proximity generator was updated to reflect the new structure.
  • ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean: Deletes the file ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean which contained definitions and lemmas related to proximity gaps for Reed-Solomon codes.
  • ArkLib/Data/CodingTheory/ProximityGap/Interleaved.lean: The file is renamed to Interleaved.lean and updated to include the statement of Lemma 4.3 from [AHIV22] regarding distance lower bounds for interleaved linear codes.
  • ArkLib/Data/CodingTheory/ProximityGap/ReedSolomon.lean: The file BCIKS20.lean is renamed to ReedSolomon.lean and updated to include combinatorial proximity results from [AHIV22] concerning Reed-Solomon codes and affine lines.
  • ArkLib/ProofSystem/Stir.lean: The import paths in ArkLib/ProofSystem/Stir.lean were updated to reflect the refactoring of Stir-related modules into the ArkLib.Data.CodingTheory namespace.
  • ArkLib/ProofSystem/Stir/ProximityGap.lean: This change removes the ArkLib/ProofSystem/Stir/ProximityGap.lean file, which contained the definition of the proximity_gap lemma for Reed-Solomon codes within the STIR proof system.
  • ArkLib/ProofSystem/Stir/RBRSoundness.lean: The file is renamed from MainThm.lean to RBRSoundness.lean with updated imports, and the field size constraint in the STIR main theorem is corrected from an upper bound to a lower bound.
  • ArkLib/ProofSystem/Whir.lean: The changes update import statements to reflect the relocation of various Whir-related modules to the ArkLib.Data.CodingTheory namespace.
  • ArkLib/ProofSystem/Whir/RBRSoundness.lean: The imports in RBRSoundness.lean are updated to reflect the relocation of block relative distance and proximity modules to the ArkLib.Data.CodingTheory namespace.

Last updated: 2025-12-28 00:53 UTC.

@alexanderlhicks
Copy link
Collaborator Author

@chung-thai-nguyen since you had done some refactoring already in a previous: does the organisation done in this refactoring seem fine to you? I've tried to mostly stick with your outline but made some things a bit more generic where we overlapped (e.g. generic filenames instead of explicit references to papers now that several files have content from multiple papers, ...)

@chung-thai-nguyen
Copy link
Collaborator

@chung-thai-nguyen since you had done some refactoring already in a previous: does the organisation done in this refactoring seem fine to you? I've tried to mostly stick with your outline but made some things a bit more generic where we overlapped (e.g. generic filenames instead of explicit references to papers now that several files have content from multiple papers, ...)

@alexanderlhicks The generic file names (in ProximityGap folder) looks reasonable if we want a centralized place for readers to find the proximity gap theorems available for usage (but likely not the definitions since they are already in ProximityGap/Basic.lean), e.g. in OracleReduction protocols. My main concern was due to the bloat of general files.

For Interleaved.lean, most (if not all) proximity gap definitions/theorems can be written from the interleaved code POV. Similarly for ReedSolomon.lean, a lot of papers state proximity gaps for RS codes. 1 theorem might be put in both Interleaved.lean & ReedSolomon.lean if we don't have clear distinction in terms of purpose of the files. I think we can have these generic files but only for theorem statements across papers, and the helper lemmas should be put somewhere else if we really decide to have monolithic files like that in the long term. If we don't split helper lemmas, the files would look pretty long & distracted (e.g. the DG25 file is 2k lines and it is only about proximity gaps for multilinear combination, but its main theorems are short). But idk how should we do it atm. What do you think?

@alexanderlhicks
Copy link
Collaborator Author

alexanderlhicks commented Jan 19, 2026

For Interleaved.lean, most (if not all) proximity gap definitions/theorems can be written from the interleaved code POV. Similarly for ReedSolomon.lean, a lot of papers state proximity gaps for RS codes. 1 theorem might be put in both Interleaved.lean & ReedSolomon.lean if we don't have clear distinction in terms of purpose of the files

Yep, beyond what's been done on this PR now (which is really just moving stuff across, particularly the STIR/WHIR contents), I think a next step (in this PR or a follow up) will be to see where we can clean up and unify results (STIR and WHIR alone have/had overlap given parts of the WHIR paper are literally copy pasted from the STIR paper), but this first requires setting up the file structure so that we know what boxes we to put things in.

If we don't split helper lemmas, the files would look pretty long & distracted (e.g. the DG25 file is 2k lines and it is only about proximity gaps for multilinear combination, but its main theorems are short). But idk how should we do it atm. What do you think?

Yep, I agree about the bloat. One option is to have small files for key results and split auxilliary/helpful lemmas across other files according to what they cover. See for example the edits I made to the proof of Polishchuk Spielman here: https://github.com/alexanderlhicks/ArkLib/pull/1/files, which keeps the proof of the main result in PolishchukSpielman.lean and splits intermnediary results across three files based on what they cover (if what they cover is not easily categorized, they can just be split into Aux.lean or Aux1.lean, Aux2.lean if they are too long...).
This makes it easy to browse important theorems whilst ignoring intermeidary results, and speeds up build times if working on any individual file.

@chung-thai-nguyen
Copy link
Collaborator

chung-thai-nguyen commented Jan 20, 2026

@alexanderlhicks Yes, we really need to figure out the right file structure. Here I give some suggestions for your consideration:
For proximity gaps:

  • I think proximity gaps/generators, Mutual/Weighted CA definitions (& their very fundamental results) can be put into same place as CA (in ProximityGap/Basic.lean), as we did for CodingTheory/Basic.lean, because these things are core upstream definitions. These are what most worth generalizing.

  • Then comes the instantiations/specific results of these defs: we can categorize results (main theorems only) for generic LinearCode vs ReedSolomon code since they are most of the cases for now.

  • For Interleaved.lean, the only theorem fits into the category of proximity gap for interleaved code (we see the concept of interleaved word of interleaved symbols), is Theorem 3.1 of DG25 paper, the AHIV22 lemmas don't fit (it's just interleaved word of generic (not-necessarily-interleaved) symbol, as in BCIKS20). We can put DG25 theorems into LinearCode file & ReedSolomon file of ProximityGap accordingly so maybe we shouldn't have this file.

  • Universal tools like Polishchuk-Spielman can have their own folder/files as you did (https://github.com/alexanderlhicks/ArkLib/pull/1/changes) since it's well-known, while some paper-specific techniques/lemmas that are not widely reusable can be put into Aux files like you said (e.g. a big Aux section with the paper name so we don't confuse ourselves later).

For other tools like OOD, folding, ... we can continue as you do and I think we can unify sth out of different formulations of them (FRI/STIR/WHIR/Binius, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments