These rules apply to Codex when working in this repository.
- Install pre-commit hook (required):
ln -s -f ../../scripts/pre-commit .git/hooks/pre-commit— formatting is enforced by CI - Mobile app setup:
cd app && bash setup.sh ios(orandroid)
- Never kill, stop, or restart the production macOS app (
/Applications/omi.app, bundle idcom.omi.computer-macos) during local development or testing. - Development scripts/commands must target only dev app processes (for example
Omi Dev.app/com.omi.desktop-dev), never production.
- No in-function imports. All imports must be at the module top level.
- Follow the module hierarchy when importing. Higher-level modules import from lower-level modules, never the reverse.
Module hierarchy (lowest to highest):
database/utils/routers/main.py
- Memory management: free large objects immediately after use. E.g.,
delfor byte arrays after processing,.clear()for dicts/lists holding data.
Shared: Firestore, Redis
backend (main.py)
├── ws ──► pusher (pusher/)
├── ──────► diarizer (diarizer/)
├── ──────► vad (modal/)
└── ──────► deepgram (self-hosted or cloud)
pusher
├── ──────► diarizer (diarizer/)
└── ──────► deepgram (cloud)
agent-proxy (agent-proxy/main.py)
└── ws ──► user agent VM (private IP, port 8080)
notifications-job (modal/job.py) [cron]
Helm charts: backend/charts/{backend-listen,pusher,diarizer,vad,deepgram-self-hosted,agent-proxy}/
- backend (
main.py) — REST API. Streams audio to pusher via WebSocket (utils/pusher.py). Calls diarizer for speaker embeddings (utils/stt/speaker_embedding.py). Calls vad for voice activity detection and speaker identification (utils/stt/vad.py,utils/stt/speech_profile.py). Calls deepgram for STT (utils/stt/streaming.py). - pusher (
pusher/main.py) — Receives audio via binary WebSocket protocol. Calls diarizer and deepgram for speaker sample extraction (utils/speaker_identification.py→utils/speaker_sample.py). - agent-proxy (
agent-proxy/main.py) — GKE. WebSocket proxy atwss://agent.omi.me/v1/agent/ws. Validates Firebase ID token, looks upagentVmin Firestore, proxies bidirectionally to VM'sws://<ip>:8080/ws. VM credentials never leave the server. - diarizer (
diarizer/main.py) — GPU. Speaker embeddings at/v2/embedding. Called by backend and pusher (HOSTED_SPEAKER_EMBEDDING_API_URL). - vad (
modal/main.py) — GPU./v1/vad(voice activity detection) and/v1/speaker-identification(speaker matching). Called by backend only (HOSTED_VAD_API_URL,HOSTED_SPEECH_PROFILE_API_URL). - deepgram — STT. Streaming uses self-hosted (
DEEPGRAM_SELF_HOSTED_URL) or cloud based onDEEPGRAM_SELF_HOSTED_ENABLED(utils/stt/streaming.py). Pre-recorded always uses Deepgram cloud (utils/stt/pre_recorded.py). Called by backend and pusher. - notifications-job (
modal/job.py) — Cron job, reads Firestore/Redis, sends push notifications.
Keep this map up to date. When adding, removing, or changing inter-service calls, update this section and the matching section in CLAUDE.md.
If a PR changes how audio streaming, transcription, conversation lifecycle, speaker identification, or the listen/pusher WebSocket protocol works — update docs/doc/developer/backend/listen_pusher_pipeline.mdx in the same PR. This includes changes to timeouts, event types, processing flow, or inter-service communication between listen and pusher.
- All user-facing strings must use l10n (
context.l10n.keyName). Add keys to ARB files usingjqto avoid reading large files. - When adding new l10n keys, translate all 33 non-English locales — never leave English text in non-English ARB files. Use
omi-add-missing-language-keys-l10nskill for translations. Ensure{parameter}placeholders match the English ARB exactly. - After modifying ARB files in
app/lib/l10n/, regenerate localizations:cd app && flutter gen-l10n
After any Flutter UI edit, verify programmatically with agent-flutter. Marionette is already integrated in debug builds. Install once: npm install -g agent-flutter-cli.
Edit → Verify → Evidence loop:
- Edit code, hot restart:
kill -SIGUSR2 $(pgrep -f "flutter run" | head -1) - Connect:
AGENT_FLUTTER_LOG=/tmp/flutter-run.log agent-flutter connect - Verify:
agent-flutter snapshot -i(see widgets on screen) - Interact:
agent-flutter press @e3/press 540 1200(coordinates) /find type button press/fill @e5 "text"/dismiss(system dialogs) - Evidence:
agent-flutter screenshot /tmp/evidence.png
Key rules:
- Must reconnect after every hot restart (kills VM Service session).
- Refs go stale frequently (Flutter rebuilds aggressively) — always re-snapshot before every interaction. Use
press x yas fallback. - Use
AGENT_FLUTTER_LOGpointing to flutter run stdout (not logcat) for auto-detect. - Prefer
find type Xorfind key "name"over hardcoded@reffor stability. - When adding interactive widgets, use
Key('descriptive_name')for agent discoverability. - App flows & exploration skill: See
app/e2e/SKILL.mdfor navigation architecture, widget patterns, and reference flows. - Full command reference:
agent-flutter schemaoragent-flutter --help.
After any Swift UI edit, verify programmatically with agent-swift. No app-side instrumentation needed — uses macOS Accessibility API. Install once: brew install beastoin/tap/agent-swift.
Requires: Accessibility permission for Terminal.app (System Settings → Privacy & Security → Accessibility).
Edit → Verify → Evidence loop:
- Edit code, rebuild:
cd desktop && ./run.sh - Connect:
agent-swift connect --bundle-id com.omi.desktop-dev - Verify:
agent-swift snapshot -i(interactive elements only) - Interact:
agent-swift click @e3/fill @e5 "text"/find role button click - Assert:
agent-swift is exists @e3/wait text "Settings" - Evidence:
agent-swift screenshot /tmp/evidence.png
Key rules:
agent-swift doctorverifies Accessibility permission and target app.- Prefer
clickoverpressfor SwiftUI —clicksends CGEvent clicks (triggers NavigationLink),presssends AXPress (AppKit only). - Refs stale after
click/press/fill/scroll— re-snapshot before next interaction. - Always use
snapshot -i— full snapshots of complex apps are very verbose. - Argument order:
get <property> <ref>,is <condition> <ref>,wait <condition> [<target>],find <locator> <value>. - JSON output:
--jsonflag,AGENT_SWIFT_JSON=1env var, or pipe to auto-detect. - 15 commands:
doctor,connect,disconnect,status,snapshot,press,click,fill,get,find,screenshot,is,wait,scroll,schema. - Works with any macOS app (SwiftUI, AppKit, Electron) — zero app-side setup.
- Dev bundle ID:
com.omi.desktop-dev. Prod:com.omi.computer-macos. - If you launch a custom-named desktop test build, keep the bundle suffix and app name identical so auth callbacks reopen the correct app. Example:
1233.appshould usecom.omi.1233,search.appshould usecom.omi.search, and mismatches like1233.appwithcom.omi.desktop-devare not allowed. - App flows & exploration skill: See
desktop/e2e/SKILL.mdfor navigation architecture, interaction patterns, and reference flows. - Full command reference:
agent-swift --helporagent-swift schema. - When asked to build or rebuild the desktop app for testing, don't stop at a successful compile: launch the dev app, interact with it programmatically to confirm it actually runs, and report any environment blocker if full interaction is impossible.
Always format code after making changes. The pre-commit hook handles this automatically, but you can also run manually:
- Dart (app/):
dart format --line-length 120 <files>- Files ending in
.gen.dartor.g.dartare auto-generated and should not be formatted manually.
- Files ending in
- Python (backend/):
black --line-length 120 --skip-string-normalization <files> - C/C++ (firmware: omi/, omiGlass/):
clang-format -i <files>
- Never push directly to
main. - Never merge directly from a local branch. Land changes through a PR only.
- When a change should go remote, create or use a feature branch, commit there, open/update a PR, and merge via the PR.
- Always work in a git worktree for code changes. Use
EnterWorktreeat the start of a task to isolate your work.
- Update this file and
CLAUDE.mdin the same commit when rules change. - For architecture or core flow changes, update Mintlify docs (
docs/doc/developer/) in the same PR.
- Always run tests before committing:
- Backend changes: run
backend/test.sh - App changes: run
app/test.sh
- Backend changes: run
- Run
backend/test-preflight.shfirst to verify tools, packages, and env vars are ready. - Backend unit tests need:
python3,pytest, packages fromrequirements.txt,ENCRYPTION_SECRET(set by test.sh). - Integration tests optionally need:
OPENAI_API_KEY,DEEPGRAM_API_KEY,ADMIN_KEY, Redis connectivity,GOOGLE_APPLICATION_CREDENTIALS.