Feature/grs debug sampler by sermengi · Pull Request #3693 · verifywise-ai/verifywise

sermengi · 2026-04-08T18:23:46Z

Describe your changes

Summary

Introduces grs_debug_sampler.py, a reproducible scenario sub-sampler for smoke-testing the GRS pipeline, and post_sampling_analysis.ipynb, a companion notebook for inspecting the produced dataset.

`grs_debug_sampler.py`

A standalone CLI tool that draws a small (15–100), reproducible sample from the three GRS scenario pools (GPT, Gemini, Claude) and writes a JSONL output file plus a JSON manifest.

Sampling algorithm

Phase 1 — one scenario drawn per obligation (alphabetical order, seed=42), guaranteeing full obligation coverage.
Phase 2 — source-balanced fill on the remaining budget; shortfalls from exhausted sources are redistributed across non-exhausted ones.
Final sample is shuffled with a second RNG (seed+1) for deterministic output ordering.

Schema normalisation
Adds normalize_record() to bridge the gap between the pipeline's native output schema (seed_trace, mutation_trace, governance_triggers) and the five sampler-required fields (source, obligation_id, scenario_type, mutation_type, primary_dimension). No pre-processing step needed.

Audit checks (recorded in manifest)

Check	Type	Rule
`source_balance`	Hard	Each source 33% ± 10%
`obligation_coverage`	Hard	All Phase-1 obligations in final sample
`sample_size`	Hard	15 ≤ n ≤ 100
`id_uniqueness`	Hard	No duplicate `scenario_id`s
`dimension_coverage`	Soft	All 5 dimensions represented

Usage

python3 grs_debug_sampler.py \
  --source-a datasets/grs_scenarios_v0.1/final/scenarios.jsonl \
  --source-b datasets/grs_scenarios_v0.2/final/scenarios.jsonl \
  --source-c datasets/grs_scenarios_v0.3/final/scenarios.jsonl \
  --target-n 50
Output defaults to datasets/debug/grs_debug_sample.jsonl and
datasets/debug/grs_debug_manifest.json.

###Test plan

Run pytest test_grs_debug_sampler.py — all 15 tests pass
Run sampler against the three scenario pools (--target-n 50) — overall
audit PASS, manifest written, no drops
Open post_sampling_analysis.ipynb and run all cells top-to-bottom —
no errors, all sections render correctly

Write your issue number after "Fixes "

This PR does not intend to fix any specific issues.

Please ensure all items are checked off before requesting a review:

I deployed the code locally.
I have performed a self-review of my code.
I have included the issue # in the PR.
I have labelled the PR correctly.
The issue I am working on is assigned to me.
I have avoided using hardcoded values to ensure scalability and maintain consistency across the application.
I have ensured that font sizes, color choices, and other UI elements are referenced from the theme.
My pull request is focused and addresses a single, specific feature.
If there are UI changes, I have attached a screenshot or video to this PR.

Spec for an interactive Jupyter notebook to explore dataset pool statistics (source balance, scenario type, mutation families, obligation coverage, governance triggers) before running the GRS v3.0 sampling operation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ebook 10-task plan covering all 8 notebook sections with complete cell code, execution verification steps, and commit checkpoints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ture

…tput write

Implement phase2_draw() to balance scenario selection across sources (claude, gemini, gpt). Allocates budget equally with remainders distributed to first sources alphabetically. Handles shortfalls by drawing from overflow pools of non-exhausted sources. Wires Phase 2 into main() with early exit when n_phase2 <= 0 (no budget remaining). Adds T08 (source balance tolerance check) and T15 (Phase 1-only when target == obligations). All 6 tests pass including new Phase 2 tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… in T15

… 15 tests pass

…untime

…-sampling notebook grs_debug_sampler.py: add normalize_record() to derive the five sampler-required fields (source, obligation_id, scenario_type, mutation_type, primary_dimension) from the pipeline's native schema (seed_trace, mutation_trace, governance_triggers), so the sampler accepts actual pipeline output without any pre-processing step. post_sampling_analysis.ipynb: new notebook with 13 sections covering audit results, source/phase breakdown, obligation coverage, dimension and mutation family distributions, domain/industry diversity, governance triggers heatmap, risk level, and sample-vs-pool comparison. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sermengi and others added 19 commits April 7, 2026 13:51

docs(GRSModule): add implementation plan for GRS dataset explorer not…

50dcd14

…ebook 10-task plan covering all 8 notebook sections with complete cell code, execution verification steps, and commit checkpoints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(GRSModule): rename notebook to pre_sampling_analysis.ipynb

a110d64

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(grs-sampler): scaffold CLI arg parsing + test fixture infrastruc…

1dc19d1

…ture

fix(grs-sampler): add missing return type annotations on stubs

e5c1fbd

feat(grs-sampler): implement load/validate/namespace/dedup + basic ou…

434cfbf

…tput write

fix(grs-sampler): validate mutation_type string value in load_source

4f1a566

feat(grs-sampler): implement Phase 1 obligation coverage draw

f42a841

fix(grs-sampler): guard against empty pool and zero obligations in main

3ea1491

fix(grs-sampler): clarify random_seed_2 intent, add returncode assert…

e306fb8

… in T15

feat(grs-sampler): add rng2 shuffle for deterministic output ordering

85a65f8

fix(grs-sampler): add returncode guards in T03/T04, clarify T04 comment

a36e5b5

feat(grs-sampler): implement audit checks and complete manifest — all…

f5d4e6b

… 15 tests pass

fix(grs-sampler): restore T15 to spec (seed=42, target-n=15, phase2==0)

18888f7

feat(grs-sampler): default output to datasets/debug/, create dir at r…

d7ea38d

…untime

fix(grs-sampler): also create manifest directory at runtime

632944b

Merge branch 'develop' into feature/grs-debug-sampler

3eb73f9

sermengi requested review from EfeAcar6431, MuhammadKhalilzadeh and gorkem-bwl April 8, 2026 18:23

sermengi self-assigned this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/grs debug sampler#3693

Feature/grs debug sampler#3693
sermengi wants to merge 19 commits intodevelopfrom
feature/grs-debug-sampler

sermengi commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sermengi commented Apr 8, 2026

Describe your changes

Summary

grs_debug_sampler.py

Write your issue number after "Fixes "

Please ensure all items are checked off before requesting a review:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`grs_debug_sampler.py`