Skip to content

fix: launch-readiness pass — bug fixes, hardening, CI, and tests#2

Open
changhsiuwei wants to merge 5 commits intosancliffe:mainfrom
changhsiuwei:claude/peaceful-goldwasser-420c25
Open

fix: launch-readiness pass — bug fixes, hardening, CI, and tests#2
changhsiuwei wants to merge 5 commits intosancliffe:mainfrom
changhsiuwei:claude/peaceful-goldwasser-420c25

Conversation

@changhsiuwei
Copy link
Copy Markdown

Summary

A four-commit launch-readiness pass on the codebase:

  • 9653a81 P0 bug fixes — broken tests, default config path mismatch, Docker build, exit-on-voice-command
  • d0a9f7d P1 hardening — dynamic TTS sample rate, transcriber thread leak fix, thread-safe retry, LLM rollback safety, extracted WakewordDetector with 7 unit tests, dependency upper bounds
  • 18229e9 polish — CI workflow (ruff + pytest on Python 3.10/3.11/3.12), --version flag, README troubleshooting, repo cleanup
  • 000fca7 test repairs — fixed four flaky tests uncovered by full pytest run

Test plan

  • python -m pytest — 25 passed, 4 skipped (Ollama integration), 0 failed
  • ruff check src tests — all checks pass
  • python -m py_compile — all source files compile
  • Manual end-to-end voice pipeline test (requires hardware + Ollama)

t3andy1025 and others added 5 commits May 4, 2026 09:42
- tests/test_transcriber.py: fix `bad_logprob_.pysegment` typo that made
  the entire test suite NameError on import.
- audio_utils.py: replace `\n` literals in device-listing output with
  real newlines so `--list-devices` is actually readable.
- audio_utils.py: correct DEFAULT_SETTINGS wakeword path
  (`hey_jarvis_v2.onnx` -> `jarvis_v2.onnx`) to match the file shipped in
  models/, so the assistant can start without a config.ini.
- Dockerfile: copy `src/` and `README.md` before `pip install .` so the
  package is actually installable in the image; move runtime assets
  (config + models) after the install layer.
- voice_assistant.py: replace bare `exit(0)` on the spoken "exit" /
  "goodbye" command with `raise KeyboardInterrupt`, so the cleanup path
  in assistant.py's finally block runs and closes audio/TTS threads.
- voice_assistant.py: raise a clear RuntimeError if openWakeWord loads
  zero models, instead of an opaque IndexError.
- tests/test_synthesizer.py: stop relying on string-path patches against
  modules that were already substituted with MagicMock; patch
  synth_module.PiperVoice and synth_module.sd.OutputStream directly
  and drain the queue with queue.join() instead of time.sleep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctor

Quality fixes uncovered during the launch-readiness review.

- synthesizer.py: query the configured output device's default sample
  rate at worker startup instead of hard-coding 48000Hz; fall back to
  the model's native rate if the device cannot be queried, so machines
  without 48kHz output stop crashing on TTS init.
- transcriber.py:
  - Re-use a single ThreadPoolExecutor for the whole Transcriber
    lifetime instead of constructing one per call (was leaking workers
    on long sessions).
  - `transcribe()` now accepts explicit threshold args and no longer
    reads them from `self.args`, so the retry loop in
    voice_assistant.py can pass per-attempt thresholds without
    monkey-patching shared state.
  - close() shuts the executor down with cancel_futures.
- voice_assistant.py: `_transcribe_with_retry` no longer mutates
  `self.args.whisper_*`; it computes per-attempt thresholds and passes
  them through. Eliminates a subtle race where an exception or
  early-return left thresholds permanently relaxed.
- llm_handler.py: snapshot the message list before mutating it, and
  restore the snapshot on streaming failure. The previous "pop the last
  user message" rollback was unsafe because `_prune_history()` may
  already have discarded older turns.
- audio_input.py: drop the orphan `task_done()` call. Nothing joins on
  the buffer queue and pairing them invites future deadlocks; replaced
  with a comment documenting the choice.
- wakeword.py (new) + test_wakeword.py (new): extract the
  cooldown/consecutive/moving-average debouncing into a pure
  WakewordDetector class with 7 unit tests. The main loop in
  voice_assistant.py is now a thin adapter over this class and is far
  easier to reason about.
- config_manager.py: only treat `system_prompt` as a file path when it
  *looks* like one (short, no newlines, .txt/.md/.prompt suffix or a
  path separator). Previously, prompts containing characters that
  sanitize_file_path() rejected (e.g. literal `..`) caused the assistant
  to refuse to start.
- pyproject.toml: add upper bounds to every dependency so a future
  major release of numpy/scipy/etc. cannot silently break installs.
  Add ruff to the [test] extra.
- tests/__init__.py: refresh the stale `app/`-era comments and add the
  `src/` directory to sys.path so modern import patterns work.
- cleanup.py: stop trying to delete the long-removed `app/` directory
  and `requirements.txt`; clean `.ruff_cache` / `.mypy_cache` instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lower-priority improvements identified during the launch-readiness
review. These are not bug fixes; they are productisation work.

- .github/workflows/ci.yml (new): run ruff and pytest on Python
  3.10/3.11/3.12 for every push and PR. Installs portaudio + ffmpeg so
  the native audio deps actually compile in CI.
- pyproject.toml:
  - configure ruff (E/F/I/B/UP rules, line length 110, py310 target)
    with per-file-ignores for tests/* (stubbing sys.modules before
    imports is intentional).
  - configure pytest's testpaths.
  - add ruff to the [test] extra.
- src/voice_assistant/__init__.py: expose `__version__ = "0.1.0"`.
- config_manager.py: add `--version` to argparse, sourced from the
  package metadata.
- README.md: add a Troubleshooting section covering portaudio,
  Ollama connectivity, mic permissions, wake-word tuning, clipping,
  Whisper accuracy, and the C-toolchain requirement for webrtcvad.
  Also document the new `--version` flag and the test command.
- sample-conversation.txt -> docs/sample-conversation.log: move the
  567-line debug capture out of the project root.
- ruff sweep across the codebase: import sorting, replace `IOError`
  alias with `OSError`, `type == X` -> `type is X`, raise-from in
  except clauses, drop unused locals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Running the full suite locally surfaced four pre-existing fragile spots
in the tests that were not visible from static analysis. None of the
production code is wrong; the tests were lying.

- test_transcriber.py:
  Stop stuffing a MagicMock into `sys.modules['torch']` at import time.
  faster-whisper now pulls in transformers transitively, and
  transformers inspects `torch.__spec__` during import — a naive
  MagicMock fails that check and the entire test module fails to load.
  Switch to per-test `patch.object(transcriber_module, 'torch', ...)`.
  Also give the mock segments numeric `start`/`end` attributes so the
  new f-string segment log (`{segment.start:.2f}s-...`) doesn't blow up
  formatting MagicMock.

- test_config_manager.py:
  The mocked `parse_args()` Namespace was missing every attribute the
  function reads after parse_args returns (wakeword_model_path,
  piper_model_path, …). It only worked previously because some of those
  attributes were never accessed in the code path. Build a complete
  Namespace from DEFAULT_SETTINGS so the test matches what argparse
  actually produces.

- test_synthesizer.py:
  `_resolve_target_sample_rate` calls `sd.query_devices()` which in
  these tests returns a MagicMock; converting that to int yields a
  garbage rate that then triggers a divide-by-zero inside scipy's
  `resample`. Patch `_resolve_target_sample_rate` to return a known
  16000Hz so the worker exercises the real write path. Also use real
  zero-filled int16 bytes (not `b''`) for test_interrupt so
  `stream.write` actually has something to write before the interrupt.

- test_ollama_connection.py:
  Skip the entire integration class via `setUpClass` when the Ollama
  HTTP probe fails, instead of failing offline runs. Online runs still
  exercise the real assertions.

Result: 25 passed, 4 skipped (Ollama integration), 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For users who don't want to remember terminal commands every time, ship
three click-to-run launchers in the project root:

- start.bat
    The main launcher. cd's to the project directory, picks an
    interpreter (preferring a project-local .venv\ or venv\, falling
    back to `python` on PATH), pings http://localhost:11434/api/tags
    to warn early if Ollama is not running, then exec's `run.py`. The
    window pauses on exit so users can read errors instead of the
    window slamming shut.
- start-debug.bat
    Same flow with --debug appended. For wake-word / VAD tuning.
- list-devices.bat
    --list-devices + --list-output-devices, so users can copy the
    right indices into config.ini.

README gains a "double-click" section explaining each launcher and
how to pin one to the desktop via Send to -> Desktop (create shortcut).
The pre-existing Python and Docker run sections are renamed B/C.

We deliberately did NOT ship a frozen .exe (PyInstaller / Nuitka): the
project pulls in faster-whisper, piper-tts, sounddevice and torch via
ctranslate2, which makes single-file freezes 800MB+ and brittle on
Windows. A .bat that drives the existing entry point is faster, easier
to debug, and trivial to update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants