fix: launch-readiness pass — bug fixes, hardening, CI, and tests#2
Open
changhsiuwei wants to merge 5 commits intosancliffe:mainfrom
Open
fix: launch-readiness pass — bug fixes, hardening, CI, and tests#2changhsiuwei wants to merge 5 commits intosancliffe:mainfrom
changhsiuwei wants to merge 5 commits intosancliffe:mainfrom
Conversation
- tests/test_transcriber.py: fix `bad_logprob_.pysegment` typo that made the entire test suite NameError on import. - audio_utils.py: replace `\n` literals in device-listing output with real newlines so `--list-devices` is actually readable. - audio_utils.py: correct DEFAULT_SETTINGS wakeword path (`hey_jarvis_v2.onnx` -> `jarvis_v2.onnx`) to match the file shipped in models/, so the assistant can start without a config.ini. - Dockerfile: copy `src/` and `README.md` before `pip install .` so the package is actually installable in the image; move runtime assets (config + models) after the install layer. - voice_assistant.py: replace bare `exit(0)` on the spoken "exit" / "goodbye" command with `raise KeyboardInterrupt`, so the cleanup path in assistant.py's finally block runs and closes audio/TTS threads. - voice_assistant.py: raise a clear RuntimeError if openWakeWord loads zero models, instead of an opaque IndexError. - tests/test_synthesizer.py: stop relying on string-path patches against modules that were already substituted with MagicMock; patch synth_module.PiperVoice and synth_module.sd.OutputStream directly and drain the queue with queue.join() instead of time.sleep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctor
Quality fixes uncovered during the launch-readiness review.
- synthesizer.py: query the configured output device's default sample
rate at worker startup instead of hard-coding 48000Hz; fall back to
the model's native rate if the device cannot be queried, so machines
without 48kHz output stop crashing on TTS init.
- transcriber.py:
- Re-use a single ThreadPoolExecutor for the whole Transcriber
lifetime instead of constructing one per call (was leaking workers
on long sessions).
- `transcribe()` now accepts explicit threshold args and no longer
reads them from `self.args`, so the retry loop in
voice_assistant.py can pass per-attempt thresholds without
monkey-patching shared state.
- close() shuts the executor down with cancel_futures.
- voice_assistant.py: `_transcribe_with_retry` no longer mutates
`self.args.whisper_*`; it computes per-attempt thresholds and passes
them through. Eliminates a subtle race where an exception or
early-return left thresholds permanently relaxed.
- llm_handler.py: snapshot the message list before mutating it, and
restore the snapshot on streaming failure. The previous "pop the last
user message" rollback was unsafe because `_prune_history()` may
already have discarded older turns.
- audio_input.py: drop the orphan `task_done()` call. Nothing joins on
the buffer queue and pairing them invites future deadlocks; replaced
with a comment documenting the choice.
- wakeword.py (new) + test_wakeword.py (new): extract the
cooldown/consecutive/moving-average debouncing into a pure
WakewordDetector class with 7 unit tests. The main loop in
voice_assistant.py is now a thin adapter over this class and is far
easier to reason about.
- config_manager.py: only treat `system_prompt` as a file path when it
*looks* like one (short, no newlines, .txt/.md/.prompt suffix or a
path separator). Previously, prompts containing characters that
sanitize_file_path() rejected (e.g. literal `..`) caused the assistant
to refuse to start.
- pyproject.toml: add upper bounds to every dependency so a future
major release of numpy/scipy/etc. cannot silently break installs.
Add ruff to the [test] extra.
- tests/__init__.py: refresh the stale `app/`-era comments and add the
`src/` directory to sys.path so modern import patterns work.
- cleanup.py: stop trying to delete the long-removed `app/` directory
and `requirements.txt`; clean `.ruff_cache` / `.mypy_cache` instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lower-priority improvements identified during the launch-readiness
review. These are not bug fixes; they are productisation work.
- .github/workflows/ci.yml (new): run ruff and pytest on Python
3.10/3.11/3.12 for every push and PR. Installs portaudio + ffmpeg so
the native audio deps actually compile in CI.
- pyproject.toml:
- configure ruff (E/F/I/B/UP rules, line length 110, py310 target)
with per-file-ignores for tests/* (stubbing sys.modules before
imports is intentional).
- configure pytest's testpaths.
- add ruff to the [test] extra.
- src/voice_assistant/__init__.py: expose `__version__ = "0.1.0"`.
- config_manager.py: add `--version` to argparse, sourced from the
package metadata.
- README.md: add a Troubleshooting section covering portaudio,
Ollama connectivity, mic permissions, wake-word tuning, clipping,
Whisper accuracy, and the C-toolchain requirement for webrtcvad.
Also document the new `--version` flag and the test command.
- sample-conversation.txt -> docs/sample-conversation.log: move the
567-line debug capture out of the project root.
- ruff sweep across the codebase: import sorting, replace `IOError`
alias with `OSError`, `type == X` -> `type is X`, raise-from in
except clauses, drop unused locals.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Running the full suite locally surfaced four pre-existing fragile spots
in the tests that were not visible from static analysis. None of the
production code is wrong; the tests were lying.
- test_transcriber.py:
Stop stuffing a MagicMock into `sys.modules['torch']` at import time.
faster-whisper now pulls in transformers transitively, and
transformers inspects `torch.__spec__` during import — a naive
MagicMock fails that check and the entire test module fails to load.
Switch to per-test `patch.object(transcriber_module, 'torch', ...)`.
Also give the mock segments numeric `start`/`end` attributes so the
new f-string segment log (`{segment.start:.2f}s-...`) doesn't blow up
formatting MagicMock.
- test_config_manager.py:
The mocked `parse_args()` Namespace was missing every attribute the
function reads after parse_args returns (wakeword_model_path,
piper_model_path, …). It only worked previously because some of those
attributes were never accessed in the code path. Build a complete
Namespace from DEFAULT_SETTINGS so the test matches what argparse
actually produces.
- test_synthesizer.py:
`_resolve_target_sample_rate` calls `sd.query_devices()` which in
these tests returns a MagicMock; converting that to int yields a
garbage rate that then triggers a divide-by-zero inside scipy's
`resample`. Patch `_resolve_target_sample_rate` to return a known
16000Hz so the worker exercises the real write path. Also use real
zero-filled int16 bytes (not `b''`) for test_interrupt so
`stream.write` actually has something to write before the interrupt.
- test_ollama_connection.py:
Skip the entire integration class via `setUpClass` when the Ollama
HTTP probe fails, instead of failing offline runs. Online runs still
exercise the real assertions.
Result: 25 passed, 4 skipped (Ollama integration), 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For users who don't want to remember terminal commands every time, ship
three click-to-run launchers in the project root:
- start.bat
The main launcher. cd's to the project directory, picks an
interpreter (preferring a project-local .venv\ or venv\, falling
back to `python` on PATH), pings http://localhost:11434/api/tags
to warn early if Ollama is not running, then exec's `run.py`. The
window pauses on exit so users can read errors instead of the
window slamming shut.
- start-debug.bat
Same flow with --debug appended. For wake-word / VAD tuning.
- list-devices.bat
--list-devices + --list-output-devices, so users can copy the
right indices into config.ini.
README gains a "double-click" section explaining each launcher and
how to pin one to the desktop via Send to -> Desktop (create shortcut).
The pre-existing Python and Docker run sections are renamed B/C.
We deliberately did NOT ship a frozen .exe (PyInstaller / Nuitka): the
project pulls in faster-whisper, piper-tts, sounddevice and torch via
ctranslate2, which makes single-file freezes 800MB+ and brittle on
Windows. A .bat that drives the existing entry point is faster, easier
to debug, and trivial to update.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A four-commit launch-readiness pass on the codebase:
WakewordDetectorwith 7 unit tests, dependency upper bounds--versionflag, README troubleshooting, repo cleanupTest plan
python -m pytest— 25 passed, 4 skipped (Ollama integration), 0 failedruff check src tests— all checks passpython -m py_compile— all source files compile