fix: remove references to deprecated Bedrock Sonnet 4 model ID by daniel-pittman · Pull Request #146 · msu-denver/bili-core

daniel-pittman · 2026-04-17T19:08:02Z

Summary

Replaces all remaining references to the deprecated Bedrock model ID anthropic.claude-sonnet-4-20250514-v1:0 (and its inference-profile alias us.anthropic.claude-sonnet-4-20250514-v1:0) with the canonical cross-region inference profile ID us.anthropic.claude-sonnet-4-6 across 10 files in bili-core.
Production Sustainability Hub Engine already uses us.anthropic.claude-sonnet-4-6; this PR brings bili-core into alignment.

Why

AWS issued a deprecation notice for anthropic.claude-sonnet-4-20250514-v1:0 with the following timeline:

Date	Milestone
April 14, 2026	Legacy — already in effect
July 14, 2026	Extended access ends
October 14, 2026	End-of-life — model invocations will fail

Leaving stale model IDs in config files, evaluator defaults, and documentation creates a silent failure path once end-of-life is reached.

Changes

Spread across 3 commits:

cb7fea4 — AETHER example configs (4 files)

bili/aether/config/examples/demo_research_analysis.yaml
bili/aether/config/examples/thesis_consensus_network.yaml
bili/aether/config/examples/thesis_hierarchical_voting.yaml
bili/aether/config/examples/thesis_deliberative_consensus.yaml

553be67 — AETHER example configs (2 files)

bili/aether/config/examples/demo_code_review.yaml
bili/aether/config/examples/thesis_supervisor_hub.yaml

bb657aa — IRIS registry, AEGIS evaluator/suite/docs (4 files)

bili/iris/config/llm_config.py — runtime model registry entry
bili/aegis/evaluator/evaluator_config.py — PRIMARY_EVALUATOR_MODEL constant
bili/aegis/suites/cross_model/run_cross_model_suite.py — docstring table + MODEL_MATRIX entry
bili/aegis/docs/testing-injection.md — Tier 3 primary model reference

Note for reviewer: The commit message for bb657aa mentions "formatter-only changes (black/isort/autoflake) on unrelated files." The net diff against develop is clean (10 files, all substantive model ID swaps), so no formatter-only changes are visible in the branch diff. If formatter-only changes were staged and then reverted before committing, no action is needed. If they do appear in a detailed per-commit review, please advise whether to split them into a separate clean-up PR.

Test plan

Confirm anthropic.claude-sonnet-4-20250514-v1:0 no longer appears anywhere in the repo (git grep "claude-sonnet-4-20250514")
Confirm us.anthropic.claude-sonnet-4-6 is used in llm_config.py, evaluator_config.py, run_cross_model_suite.py, all 6 AETHER example YAMLs, and testing-injection.md
AEGIS suite: 1539 tests pass, 0 failures (confirmed pre-PR; pre-existing collection errors are due to missing langchain_core outside Docker and are unrelated to this change)
Invoke us.anthropic.claude-sonnet-4-6 manually in AWS Bedrock console to confirm the inference profile is active

🤖 Generated with Claude Code

AWS deprecated anthropic.claude-sonnet-4-20250514-v1:0 (Legacy state April 14 2026, end-of-life October 14 2026). Replace all occurrences in the four targeted AETHER example YAML files with the current inference profile ID us.anthropic.claude-sonnet-4-6, consistent with production. Files updated: - bili/aether/config/examples/demo_research_analysis.yaml (2 refs) - bili/aether/config/examples/thesis_consensus_network.yaml - bili/aether/config/examples/thesis_hierarchical_voting.yaml - bili/aether/config/examples/thesis_deliberative_consensus.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…configs Update demo_code_review.yaml (lines 58, 75) and thesis_supervisor_hub.yaml (line 46) to use the correct us.anthropic.claude-sonnet-4-6 inference profile identifier, replacing the deprecated claude-sonnet-4-20250514-v1:0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace all remaining references to the deprecated `us.anthropic.claude-sonnet-4-20250514-v1:0` (and the bare `anthropic.claude-sonnet-4-20250514-v1:0`) with the canonical inference-profile ID `us.anthropic.claude-sonnet-4-6` across: - bili/iris/config/llm_config.py — runtime model registry entry - bili/aegis/evaluator/evaluator_config.py — PRIMARY_EVALUATOR_MODEL constant - bili/aegis/suites/cross_model/run_cross_model_suite.py — docstring table + MODEL_MATRIX entry - bili/aegis/docs/testing-injection.md — Tier 3 primary model reference Also includes formatter-only changes (black/isort/autoflake) on unrelated files touched during the pre-commit formatting pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-04-17T19:08:17Z

Claude finished @daniel-pittman's task in 1m 47s —— View job

Claude Code Review — PR #146

Gather PR context and branch state
Verify no remaining references to deprecated Sonnet 4 model ID
Verify replacements use canonical inference profile ID
Review each changed file for correctness
Post final review

Summary

Recommendation: ✅ Approve — the deprecation cleanup is mechanically clean and complete. One semantic concern worth flagging before merge (display names vs. actual model version); not a blocker.

Verification

git grep "claude-sonnet-4-20250514" → no matches. All 13 occurrences of the deprecated ID are removed.
git grep "claude-sonnet-4" shows exactly 13 replacements with us.anthropic.claude-sonnet-4-6 across the 10 files listed in the PR body, plus the unrelated claude-sonnet-4-5-20250929 action input in .github/workflows/claude-security-review.yml (not in scope).
git diff origin/develop...HEAD --stat → clean 13/13 line delta, no formatter noise leaked into the final branch diff. The note in the PR body about bb657aa's commit-message mentioning unrelated formatter changes is accurate — nothing extraneous shows up here, so no split-PR cleanup is needed.

Observations

1. ⚠️ Display names vs. underlying model version (worth a quick decision before merge)

The old ID us.anthropic.claude-sonnet-4-20250514-v1:0 pointed at Sonnet 4.0 (May 14, 2025 release). The new alias us.anthropic.claude-sonnet-4-6 points at Sonnet 4.6. These are different model versions — not a straight deprecation-rename. However, the human-readable display names weren't updated:

bili/iris/config/llm_config.py:297 — "model_name": "Anthropic Claude Sonnet 4"
bili/aegis/evaluator/evaluator_config.py:30 — PRIMARY_EVALUATOR_MODEL_DISPLAY: str = "Primary — Claude Sonnet 4 (Bedrock)"
bili/aegis/suites/cross_model/run_cross_model_suite.py:157 — "Claude Sonnet 4 (Bedrock)"

This matters most for AEGIS run_cross_model_suite.py: historical benchmark results logged under "Claude Sonnet 4 (Bedrock)" will become non-comparable with post-merge runs (different model, same label). Worth either (a) updating the display names to "Claude Sonnet 4.6" for honesty in logs/CSVs, or (b) explicitly deciding the label is a stable family identifier and the version bump is acceptable. Same call applies to the IRIS LLM picker UI.

Fix this →

2. ℹ️ Model-ID format convention in llm_config.py

The registry now mixes ID formats for Anthropic Bedrock models:

us.anthropic.claude-3-5-haiku-20241022-v1:0 (dated + versioned)
us.anthropic.claude-3-7-sonnet-20250219-v1:0 (dated + versioned)
us.anthropic.claude-sonnet-4-6 (short alias)

Both forms are valid cross-region inference profile identifiers and AWS recommends the short alias going forward, so this is a stylistic observation, not a defect. Worth noting in case there's an opinion on registry consistency for a future pass.

3. ✅ YAML quoting preserved

Each AETHER example preserved its original quoting style (quoted in demo_*.yaml, unquoted in thesis_*.yaml). Good — no unnecessary stylistic churn.

File-by-file

File	Change	Verdict
`bili/iris/config/llm_config.py`	model_id on entry at line 298	✅ Correct
`bili/aegis/evaluator/evaluator_config.py`	`PRIMARY_EVALUATOR_MODEL` constant	✅ Correct
`bili/aegis/suites/cross_model/run_cross_model_suite.py`	docstring table + `MODEL_MATRIX`	✅ Correct
`bili/aegis/docs/testing-injection.md`	Tier 3 primary model reference	✅ Correct
`bili/aether/config/examples/demo_code_review.yaml`	2 agents (code_reviewer, code_quality_reviewer)	✅ Correct
`bili/aether/config/examples/demo_research_analysis.yaml`	2 agents	✅ Correct
`bili/aether/config/examples/thesis_consensus_network.yaml`	final-verdict agent	✅ Correct
`bili/aether/config/examples/thesis_deliberative_consensus.yaml`	final-decision agent	✅ Correct
`bili/aether/config/examples/thesis_hierarchical_voting.yaml`	tier-1 agent	✅ Correct
`bili/aether/config/examples/thesis_supervisor_hub.yaml`	supervisor agent	✅ Correct

Test-plan checklist (from your PR body)

✅ No remaining claude-sonnet-4-20250514 references — confirmed via grep.
✅ us.anthropic.claude-sonnet-4-6 present in all 10 files — confirmed.
⏳ AEGIS suite pass (1539/0) — cannot run in this review environment (no langchain_core / Bedrock credentials).
⏳ Manual Bedrock console invocation of us.anthropic.claude-sonnet-4-6 — out of scope for a static review.

· Branch

Update all human-readable display strings that still said "Claude Sonnet 4" to "Claude Sonnet 4.6", consistent with the us.anthropic.claude-sonnet-4-6 model ID used throughout. Covers llm_config.py, evaluator_config.py (constant + module docstring), run_cross_model_suite.py MODEL_MATRIX, and the attack_page.py UI caption. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-04-17T19:52:27Z

Claude finished @daniel-pittman's task in 1m 28s —— View job

Claude Code Review — PR #146 (follow-up on `470daf7`)

Gather latest branch state and new commit
Verify display-name updates from previous review
Re-run grep for deprecated model ID and verify replacements
Check for any remaining stale "Sonnet 4" display strings
Post final updated review

Recommendation

✅ Approve with one small follow-up — commit 470daf7 cleanly addresses the display-name concern from my prior review. Two doc stragglers remain; trivial to fix in this branch or a follow-up PR.

What `470daf7` does well

The commit updated "Claude Sonnet 4" → "Claude Sonnet 4.6" in exactly the four spots that matter for labeled output / UI:

File	Line	Change
`bili/iris/config/llm_config.py`	297	`model_name` → `"Anthropic Claude Sonnet 4.6"` ✅
`bili/aegis/evaluator/evaluator_config.py`	3, 30	module docstring + `PRIMARY_EVALUATOR_MODEL_DISPLAY` ✅
`bili/aegis/suites/cross_model/run_cross_model_suite.py`	157	`MODEL_MATRIX` display name ✅
`bili/aether/ui/attack_page.py`	311	`cross_model` suite caption ✅ (nice extra catch — I missed this one)

Benchmark CSVs / JSON result files produced by run_cross_model_suite.py and SemanticEvaluator will now label runs as "Claude Sonnet 4.6 (Bedrock)", which keeps historical-vs-current comparisons honest.

🟡 One straggler — `security-testing-quickstart.md`

grep -E "Sonnet 4([^.]|$)" finds two unversioned references in bili/aegis/docs/security-testing-quickstart.md that describe the same judge LLM whose display name was just bumped to 4.6:

Line 134: judge LLM (Claude Sonnet 4 on Bedrock by default, with Gemini 2.5 Flash as fallback).
Line 341: The judge uses Claude Sonnet 4 on Bedrock as its primary model (temperature 0.0 for

For consistency with evaluator_config.py's module docstring (which was updated), both should read Claude Sonnet 4.6 on Bedrock. Not a blocker — the PRIMARY_EVALUATOR_MODEL constant is the source of truth at runtime — but easy to sweep up.

Fix this →

Verification

git grep "claude-sonnet-4-20250514" → 0 matches (deprecated ID fully removed)
git grep "us.anthropic.claude-sonnet-4-6" → 13 matches across the 10 listed files (matches the 13-line delta reported by git diff --stat)
git grep -E "Sonnet 4([^.]|$)" → 2 matches, both in the doc file called out above
git log --oneline origin/develop..HEAD → 4 commits, all on-topic

Previous observations still standing

Model-ID format mix in llm_config.py (short alias ...sonnet-4-6 vs. dated/versioned ...-20250219-v1:0 for older Anthropic Bedrock entries) remains stylistic and out of scope.
Test-plan items ⏳ AEGIS suite run + manual Bedrock console invocation — cannot be verified from a static review.

--- · Branch

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-17T21:42:15Z

Claude finished @daniel-pittman's task in 58s —— View job

Claude Code Review — PR #146 (follow-up on `6c2c4a8`)

Gather latest branch state and new commit
Verify security-testing-quickstart.md straggler fix
Re-run grep for deprecated model ID and remaining stale "Sonnet 4" references
Post final updated review

Recommendation

✅ Approve — ready to merge. Commit 6c2c4a8 cleanly addresses the security-testing-quickstart.md straggler flagged in my prior review, and nothing else regressed.

What `6c2c4a8` does

Two targeted string swaps in bili/aegis/docs/security-testing-quickstart.md:

Line	Change
134	`judge LLM (Claude Sonnet 4 on Bedrock …)` → `Claude Sonnet 4.6 on Bedrock` ✅
341	`The judge uses Claude Sonnet 4 on Bedrock …` → `Claude Sonnet 4.6 on Bedrock` ✅

Both now match the source-of-truth PRIMARY_EVALUATOR_MODEL_DISPLAY constant in bili/aegis/evaluator/evaluator_config.py:30. Scope is minimal — 1 file, 4 lines, no other churn.

Final verification

Check	Result
`git grep "claude-sonnet-4-20250514"`	0 matches — deprecated ID fully removed
`git grep "us.anthropic.claude-sonnet-4-6"`	10 files (matches PR body inventory)
`git grep -E "Sonnet 4([^.]\|$)"`	0 matches — no stale unversioned "Sonnet 4" display strings anywhere in the repo
`git diff origin/develop...HEAD --stat`	12 files, 20 insertions / 20 deletions — clean, no formatter leakage

Commit history on branch

6c2c4a8 docs: bump Sonnet 4 -> 4.6 in security-testing-quickstart
470daf7 fix: update Sonnet 4 display names to reflect 4.6 inference profile
bb657aa fix: replace remaining deprecated Sonnet 4 model IDs and run formatters
553be67 fix: replace deprecated Bedrock model ID in remaining AETHER example configs
cb7fea4 fix: replace deprecated Bedrock model ID in AETHER example configs

Five well-scoped commits, each reviewed in a prior round — no remaining concerns.

Observations still standing (non-blocking)

Model-ID format mix in llm_config.py (short alias …sonnet-4-6 vs. dated/versioned …-20250219-v1:0 for older Anthropic Bedrock entries) — stylistic only, AWS supports both. Out of scope for this PR.
Test-plan items ⏳ AEGIS 1539-test suite run + manual Bedrock console invocation — cannot be verified from static review; relies on author confirmation.

· · Branch

daniel-pittman and others added 3 commits April 17, 2026 12:27

docs: bump Sonnet 4 -> 4.6 in security-testing-quickstart

6c2c4a8

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

daniel-pittman merged commit c7eb675 into develop Apr 17, 2026
2 checks passed

daniel-pittman deleted the fix/deprecated-sonnet-4-model-id branch April 17, 2026 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove references to deprecated Bedrock Sonnet 4 model ID#146

fix: remove references to deprecated Bedrock Sonnet 4 model ID#146
daniel-pittman merged 5 commits intodevelopfrom
fix/deprecated-sonnet-4-model-id

daniel-pittman commented Apr 17, 2026

Uh oh!

claude Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

daniel-pittman commented Apr 17, 2026

Summary

Why

Changes

Test plan

Uh oh!

claude Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review — PR #146

Summary

Verification

Observations

File-by-file

Test-plan checklist (from your PR body)

Uh oh!

claude Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review — PR #146 (follow-up on 470daf7)

Recommendation

What 470daf7 does well

🟡 One straggler — security-testing-quickstart.md

Verification

Previous observations still standing

Uh oh!

claude Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review — PR #146 (follow-up on 6c2c4a8)

Recommendation

What 6c2c4a8 does

Final verification

Commit history on branch

Observations still standing (non-blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 17, 2026 •

edited

Loading

claude Bot commented Apr 17, 2026 •

edited

Loading

Claude Code Review — PR #146 (follow-up on `470daf7`)

What `470daf7` does well

🟡 One straggler — `security-testing-quickstart.md`

claude Bot commented Apr 17, 2026 •

edited

Loading

Claude Code Review — PR #146 (follow-up on `6c2c4a8`)

What `6c2c4a8` does