Skip to content

HITL tests: CAN error resilience (SPI lockup repro)#2369

Open
adeebshihadeh wants to merge 7 commits intomasterfrom
canerr
Open

HITL tests: CAN error resilience (SPI lockup repro)#2369
adeebshihadeh wants to merge 7 commits intomasterfrom
canerr

Conversation

@adeebshihadeh
Copy link
Copy Markdown
Contributor

Summary

  • Add HITL tests that induce CAN errors (speed mismatch, bus-off) and verify the panda stays responsive over SPI
  • Tests reproduce a suspected bug where CAN error interrupts can starve SPI, causing the panda to appear locked up from the SPI master's perspective
  • CI temporarily set to only run these new tests for faster iteration

Test cases

  • test_spi_responsive_during_can_errors: Speed mismatch flood from jungle, assert SPI health responses have <250ms gaps
  • test_spi_responsive_during_bus_off: TX with no ACK → bus-off, assert SPI stays responsive
  • test_sustained_error_storm: 15s sustained CAN errors, measure SPI response gaps (avg/p95/max)
  • test_can_recovery_after_errors: After errors, fix speeds and verify normal CAN resumes
  • test_no_faults_during_errors: CAN errors should not trigger interrupt rate faults

Suspected root causes

  1. can_clear_send()llcan_init() does blocking busy-waits (up to 500ms) inside a CAN RX ISR, starving SPI
  2. CAN error interrupt flags cleared based on stale IR snapshot, causing re-fire storms

Test plan

  • Run on HITL (tres + cuatro) to see which tests fail and how often
  • Use failure stats to validate the fix

🤖 Generated with Claude Code

adeebshihadeh and others added 7 commits March 7, 2026 10:15
Tests that the panda stays responsive over SPI when CAN
error conditions occur (speed mismatch, bus-off, sustained
error storms). Validates recovery after errors and checks
that interrupt rate faults aren't triggered.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Catch PandaSpiException instead of crashing test runner
- Disable pytest-xdist (tests share one panda)
- Report lockup as assertion failure with stats

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant