A menu bar helper for macOS that watches your cursor in the background, captures contextual screenshots, and pipes them to an AI that replies with playful, sarcastic commentary. Commentary is spoken aloud in real time (using AVSpeechSynthesizer) and mirrored via macOS notifications. A global hotkey lets you pause or resume without touching the UI.
- Background cursor polling combined with movement/time heuristics to throttle captures.
- Focused screenshot capture: crops around the cursor so the AI sees the relevant context, with optional fallback to full-screen.
- Pluggable AI client: OpenAI (
gpt-4o-miniby default) or an offline mock when no API key is present. - Real-time voice playback plus banner notifications (with customizable voice/rate/pitch).
- A floating "fairy" overlay that glides to self-chosen spots, highlights windows/text it finds interesting, and whispers the same quip (toggleable from the menu bar).
- Menu bar controls and a
⌃⌥⌘Pglobal hotkey to pause/resume the pipeline. - Permission guidance for Screen Recording & Accessibility.
- Mode selector (Casual, Focus, Accessibility) that reshapes prompts, cadence, and visuals to match how you want to listen.
- macOS 13 or newer (AppKit, AVFoundation, UserNotifications).
- Screen Recording and Accessibility permissions (granted via System Settings → Privacy & Security).
- Swift toolchain installed. When running inside a sandboxed environment, point
CLANG_MODULE_CACHE_PATHto a writable folder (see below).
Set your OpenAI credentials once:
bash scripts/set_api_key.sh
# Optional: verify what the app sees
bash scripts/check_api_key.shThe setup script also lets you pick an Apple speech voice (e.g. com.apple.ttsbundle.Samantha-premium) and tweak rate/pitch for a more natural delivery.
The script saves config.json under ~/Library/Application Support/CursorCompanion/. Subsequent launches (even via the .app bundle) will pick up the key, model, and optional base URL automatically. You can still override via environment variables if you prefer.
Without a valid API key, the app stays functional but falls back to a local mock that narrates generic hints.
From the repository root:
# Optional: redirect the module cache if the default (~/.cache/clang) is blocked
export CLANG_MODULE_CACHE_PATH="$(pwd)/.build/ModuleCache"
mkdir -p "$CLANG_MODULE_CACHE_PATH"
# Option A: run from Xcode (`⌘R`) or `swift run`
swift run
# Option B: package a reusable .app bundle (stable permissions)
bash scripts/package_app.sh # builds Release & installs ~/Applications/CursorCompanion.app
open ~/Applications/CursorCompanion.appIf swift build fails due to sandbox restrictions (e.g., when caches under ~/Library are inaccessible), ensure the module cache export above is set or run outside the sandbox. Once the build succeeds, launch the generated binary or keep it running via swift run.
- On first launch, macOS will prompt for Accessibility and Screen Recording consent. Approve both in System Settings.
- A bolt icon appears in the menu bar. Use it to pause/resume or quit.
- While active, every ~12 seconds (or after sizable mouse movement) the app captures a cropped region around the cursor and sends it to the configured AI provider.
- Responses arrive as spoken commentary plus a notification banner. The speech engine interrupts previous utterances for fresh takes.
- Toggle the fairy overlay directly from the ⚡️ menu if you need a quieter desk.
- Toggle the pipeline from anywhere with
⌃⌥⌘P. - For consistent macOS permissions, launch the packaged app at
~/Applications/CursorCompanion.app. Its bundle path stays constant across rebuilds, so Screen Recording and Accessibility approvals stick. - Switch interaction modes from ⚡️ → Mode to choose between Casual (friendly banter), Focus (directive tips), or Accessibility (spoken descriptions with the overlay tucked away).
- Casual: the existing laid-back tone with lively overlay bubbles and quick voice commentary.
- Focus: slows the capture cadence and nudges the assistant to surface one actionable observation so it stays out of your way.
- Accessibility: prioritises descriptive narration, disables the fairy overlay, and keeps the experience voice-first for screen-reader flows.
- Swap in a different provider by adding a new
AIProviderimplementation (e.g., pointing to a local multimodal server) and injecting it throughAppController. - Introduce cropped captures (around the cursor) by extending
ScreenshotCapturerto create sub-images before encoding. - Persist transcripts by logging
AIResponsevalues to disk. - Add a proper preferences window for tuning capture cadence, hotkey, or voice settings.
- Continuous screenshots raise privacy considerations; consider adding an allow/deny app list before production use.
- Error handling currently surfaces via notifications; you might want backoff/retry logic for transient network issues.
- Unit/UI tests are not wired up yet. The capture/AI layers should be factored for dependency injection to simplify testing.
- Building via
swift buildmay require additional toolchain setup if the installed SDK and compiler versions mismatch.