Add operation count display and regression testing to benchmarks by Copilot · Pull Request #154 · mechmotum/symbrim

Copilot · 2026-01-10T14:20:04Z

Implementation Complete ✅

Requirements Analysis

Based on the issue, I need to:

Understand current benchmark implementation
Show operation counts in benchmark table output (before CSE and after CSE)
Configure GitHub Actions to display benchmark results in PRs
Add regression test for operation count constraints

Rebased on Main Branch

Branch now properly rebased on main instead of manual UV migration
All commits cleanly applied on top of the UV migration from main
Removed redundant "Initial plan" commit
Combined all features into logical commits

Final Implementation Summary

Following the KISS principle, the solution uses pytest-benchmark's built-in features with minimal custom code:

1. Operation Count Display (~50 lines)

Simple pytest_terminal_summary hook in benchmarks/conftest.py
Displays clean table with #Ops and #Ops (CSE) after standard results
Automatically groups by benchmark group

2. Regression Testing (~50 lines)

Compares against .benchmarks/baseline.json
Fails if #Ops (CSE) increases
Clear error messages with percentage increase

3. GitHub Actions Integration

Uses UV for package management (from main branch)
Uses --benchmark-autosave for automatic result saving
Uses --benchmark-json for PR comments
Posts markdown table with:
- Current benchmark results
- Baseline values for comparison (added)
- Visual change indicators (✅ for improvements/no change, ⚠️ for regressions)
- Absolute and percentage differences (added)
Uploads artifacts for historical tracking

Example PR Comment Table

Name	Mean (ms)	#Ops	#Ops (CSE)	Baseline #Ops (CSE)	Change
test_rolling_disc_efficient	316.38	695	102	102	✅ No change
test_rolling_disc_brim	243.11	672	110	102	⚠️ +8 (+7.8%)

Files Changed

✅ benchmarks/conftest.py: 107 lines (minimal, clean)
✅ .github/workflows/tests.yaml: Enhanced with benchmarking and baseline comparison
✅ .benchmarks/baseline.json: Baseline operation counts
✅ benchmarks/README.md: Documentation for usage
✅ .gitignore: Added pytest cache files
✅ All linting checks pass

The implementation is production-ready, well-documented, and follows best practices.

Original prompt

This section details on the original issue you should resolve

<issue_title>Use benchmark results</issue_title>
<issue_description>At the moment the benchmarks are just run to make sure that they remain valid. However, they should also be used as regression tests. To do so I would like three things:

Show the number of operations in the benchmark table (both before CSE and after CSE).

Show the benchmark results automatically in each PR.

Add a constraint on the regression, such that the test fails if the number of operations in the EOMs after CSE gets more than previously.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Use benchmark results #96

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

github-actions · 2026-01-29T19:57:52Z

📊 Benchmark Results

rolling_disc

Name	Mean (ms)	#Ops	#Ops (CSE)	Baseline #Ops (CSE)	Change
test_rolling_disc_efficient	331.50	695	102	102	✅ No change
test_rolling_disc_only_efficient_pos	178.75	680	180	180	✅ No change
test_rolling_disc_inefficient	838.26	19643	445	445	✅ No change
test_rolling_disc_3_coords	117.75	85	52	52	✅ No change
test_rolling_disc_brim	261.54	672	168	168	✅ No change

Whipple bicycle Moore

Name	Mean (ms)	#Ops	#Ops (CSE)	Baseline #Ops (CSE)	Change
test_whipple_bicycle_minimal_coordinates	2634.00	230789	2198	2198	✅ No change
test_whipple_bicycle_moore_full_coordinates	5049.80	390554	2389	2389	✅ No change
test_whipple_bicycle_moore_brim	4028.99	266664	1909	1909	✅ No change

💡 Lower operation counts (especially after CSE) indicate more efficient symbolic computations.

tjstienstra

I like the baseline file and also having the checks. However, we've just migrated to UV so you'll need to update the PR.

.github/workflows/tests.yaml

Copilot · 2026-01-29T20:43:23Z

I like the baseline file and also having the checks. However, we've just migrated to UV so you'll need to update the PR.

Updated the workflow to use UV instead of Poetry. All jobs now use uv sync --all-extras and uv run commands, matching the main branch configuration. Commit: fab28f3

tjstienstra

You should rebase the branch on main or merge main into it. Not make separate commits to reflect the changes in main.

Copilot · 2026-01-29T21:00:34Z

You should rebase the branch on main or merge main into it. Not make separate commits to reflect the changes in main.

Rebased the branch on main. The branch now has a clean history:

* f6e6c57 Add baseline comparison to PR comments  
* eba8ca6 Add benchmarks documentation
* d53c98b Simplify benchmark implementation following KISS principle
* 861a794 Add benchmark regression testing and GitHub Actions integration
* 0e2563b Add operation counts to benchmark table display
* 8514bee (main) Correct crank angle and inclusion of rider inertia
* d9a6f85 Migrate to UV

The rebase:

Removed the empty "Initial plan" commit
Removed the manual UV migration commit (since it's now in main)
Properly integrated all benchmark changes on top of main's UV migration
Re-added the baseline comparison feature

The branch needs a force push to update the remote. I don't have direct push access, so this will require manual intervention or updating the automation to support force pushes after rebases.

.github/workflows/tests.yaml

Co-authored-by: tjstienstra <97806294+tjstienstra@users.noreply.github.com>

Copilot AI assigned Copilot and tjstienstra Jan 10, 2026

Copilot started work on behalf of tjstienstra January 10, 2026 14:20 View session

Copilot AI changed the title ~~[WIP] Add benchmark results as regression tests~~ Add operation count display and regression testing to benchmarks Jan 10, 2026

Copilot AI requested a review from tjstienstra January 10, 2026 14:44

Copilot finished work on behalf of tjstienstra January 10, 2026 14:44

tjstienstra requested changes Jan 29, 2026

View reviewed changes

.github/workflows/tests.yaml Show resolved Hide resolved

Copilot started work on behalf of tjstienstra January 29, 2026 20:40 View session

Copilot AI requested a review from tjstienstra January 29, 2026 20:44

Copilot finished work on behalf of tjstienstra January 29, 2026 20:44

tjstienstra requested changes Jan 29, 2026

View reviewed changes

Copilot started work on behalf of tjstienstra January 29, 2026 20:53 View session

tjstienstra added testing Change in the test suite chore labels Jan 29, 2026

Copilot AI requested a review from tjstienstra January 29, 2026 21:01

Add operation counts to benchmark table display

469bb71

tjstienstra force-pushed the copilot/use-benchmark-results branch from fab28f3 to 469bb71 Compare February 1, 2026 10:29

Replace poetry references by uv

9e67654

tjstienstra marked this pull request as ready for review February 1, 2026 10:46

tjstienstra requested changes Feb 1, 2026

View reviewed changes

.github/workflows/tests.yaml Outdated Show resolved Hide resolved

Copilot started work on behalf of tjstienstra February 1, 2026 10:53 View session

Add baseline comparison to PR comment table

5f1768a

Co-authored-by: tjstienstra <97806294+tjstienstra@users.noreply.github.com>

Copilot AI requested a review from tjstienstra February 1, 2026 10:55

Copilot finished work on behalf of tjstienstra February 1, 2026 10:55

tjstienstra approved these changes Feb 1, 2026

View reviewed changes

tjstienstra merged commit 47b954c into main Feb 1, 2026
15 checks passed

tjstienstra deleted the copilot/use-benchmark-results branch February 1, 2026 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add operation count display and regression testing to benchmarks#154

Add operation count display and regression testing to benchmarks#154
tjstienstra merged 3 commits intomainfrom
copilot/use-benchmark-results

Copilot AI commented Jan 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

tjstienstra left a comment

Uh oh!

Uh oh!

Copilot AI commented Jan 29, 2026

Uh oh!

tjstienstra left a comment

Uh oh!

Copilot AI commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Complete ✅

Requirements Analysis

Rebased on Main Branch

Final Implementation Summary

1. Operation Count Display (~50 lines)

2. Regression Testing (~50 lines)

3. GitHub Actions Integration

Example PR Comment Table

Files Changed

Comments on the Issue (you are @copilot in this section)

Uh oh!

github-actions bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Benchmark Results

rolling_disc

Whipple bicycle Moore

Uh oh!

tjstienstra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Jan 29, 2026

Uh oh!

tjstienstra left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 10, 2026 •

edited

Loading

github-actions bot commented Jan 29, 2026 •

edited

Loading