Skip to content

Feature/grs debug sampler#3693

Open
sermengi wants to merge 19 commits intodevelopfrom
feature/grs-debug-sampler
Open

Feature/grs debug sampler#3693
sermengi wants to merge 19 commits intodevelopfrom
feature/grs-debug-sampler

Conversation

@sermengi
Copy link
Copy Markdown
Contributor

@sermengi sermengi commented Apr 8, 2026

Describe your changes

Summary

Introduces grs_debug_sampler.py, a reproducible scenario sub-sampler for smoke-testing the GRS pipeline, and post_sampling_analysis.ipynb, a companion notebook for inspecting the produced dataset.

grs_debug_sampler.py

A standalone CLI tool that draws a small (15–100), reproducible sample from the three GRS scenario pools (GPT, Gemini, Claude) and writes a JSONL output file plus a JSON manifest.

Sampling algorithm

  • Phase 1 — one scenario drawn per obligation (alphabetical order, seed=42), guaranteeing full obligation coverage.
  • Phase 2 — source-balanced fill on the remaining budget; shortfalls from exhausted sources are redistributed across non-exhausted ones.
  • Final sample is shuffled with a second RNG (seed+1) for deterministic output ordering.

Schema normalisation
Adds normalize_record() to bridge the gap between the pipeline's native output schema (seed_trace, mutation_trace, governance_triggers) and the five sampler-required fields (source, obligation_id, scenario_type, mutation_type, primary_dimension). No pre-processing step needed.

Audit checks (recorded in manifest)

Check Type Rule
source_balance Hard Each source 33% ± 10%
obligation_coverage Hard All Phase-1 obligations in final sample
sample_size Hard 15 ≤ n ≤ 100
id_uniqueness Hard No duplicate scenario_ids
dimension_coverage Soft All 5 dimensions represented

Usage

python3 grs_debug_sampler.py \
  --source-a datasets/grs_scenarios_v0.1/final/scenarios.jsonl \
  --source-b datasets/grs_scenarios_v0.2/final/scenarios.jsonl \
  --source-c datasets/grs_scenarios_v0.3/final/scenarios.jsonl \
  --target-n 50
Output defaults to datasets/debug/grs_debug_sample.jsonl and
datasets/debug/grs_debug_manifest.json.

###Test plan

  • Run pytest test_grs_debug_sampler.py — all 15 tests pass
  • Run sampler against the three scenario pools (--target-n 50) — overall
    audit PASS, manifest written, no drops
  • Open post_sampling_analysis.ipynb and run all cells top-to-bottom —
    no errors, all sections render correctly

Write your issue number after "Fixes "

This PR does not intend to fix any specific issues.

Please ensure all items are checked off before requesting a review:

  • I deployed the code locally.
  • I have performed a self-review of my code.
  • I have included the issue # in the PR.
  • I have labelled the PR correctly.
  • The issue I am working on is assigned to me.
  • I have avoided using hardcoded values to ensure scalability and maintain consistency across the application.
  • I have ensured that font sizes, color choices, and other UI elements are referenced from the theme.
  • My pull request is focused and addresses a single, specific feature.
  • If there are UI changes, I have attached a screenshot or video to this PR.

sermengi and others added 19 commits April 7, 2026 13:51
Spec for an interactive Jupyter notebook to explore dataset pool
statistics (source balance, scenario type, mutation families, obligation
coverage, governance triggers) before running the GRS v3.0 sampling
operation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ebook

10-task plan covering all 8 notebook sections with complete cell code,
execution verification steps, and commit checkpoints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement phase2_draw() to balance scenario selection across sources (claude, gemini, gpt).
Allocates budget equally with remainders distributed to first sources alphabetically.
Handles shortfalls by drawing from overflow pools of non-exhausted sources.

Wires Phase 2 into main() with early exit when n_phase2 <= 0 (no budget remaining).
Adds T08 (source balance tolerance check) and T15 (Phase 1-only when target == obligations).

All 6 tests pass including new Phase 2 tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-sampling notebook

grs_debug_sampler.py: add normalize_record() to derive the five
sampler-required fields (source, obligation_id, scenario_type,
mutation_type, primary_dimension) from the pipeline's native schema
(seed_trace, mutation_trace, governance_triggers), so the sampler
accepts actual pipeline output without any pre-processing step.

post_sampling_analysis.ipynb: new notebook with 13 sections covering
audit results, source/phase breakdown, obligation coverage, dimension
and mutation family distributions, domain/industry diversity,
governance triggers heatmap, risk level, and sample-vs-pool comparison.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant