feat(screenshot): target a single window via window_name / window_pid#233
feat(screenshot): target a single window via window_name / window_pid#233exalsch wants to merge 13 commits into
Conversation
After every screenshot, draw a glowing orange-red border over the captured area on a transparent always-on-top Tk overlay for ~2.5s. Region captures get a solid border that fades out at the end; full- screen captures show a bell-fading inner border on the union of all monitor rects. The overlay is started *after* capture and cancelled before any subsequent capture so it never appears in the captured image. Set WINDOWS_MCP_DISABLE_FLASH=1/true/yes/on to suppress the effect. The conftest disables the flash for the test suite to avoid Tk threads racing with pytest teardown.
Review Summary by QodoAdd window-targeted screenshot capture and post-capture flash overlay
WalkthroughsDescription• Add window-targeted screenshot capture via window_name (fuzzy match) and window_pid (exact PID) • Implement post-capture visual feedback with orange-red flash overlay on transparent always-on-top window • New window_resolver.py module for window enumeration, resolution, and rect retrieval using DWM extended bounds • Plumb window targeting through Screenshot and Snapshot tools with mutual exclusion validation against display Diagramflowchart LR
A["Screenshot/Snapshot Tool"] -->|window_name or window_pid| B["capture_desktop_state"]
B -->|resolve_window_capture_rect| C["window_resolver"]
C -->|enumerate_visible_windows| D["Win32 EnumWindows"]
C -->|fuzzy/PID match| E["resolve_window"]
E -->|DwmGetWindowAttribute| F["get_window_rect"]
F -->|fallback| G["GetWindowRect"]
B -->|capture_rect| H["Desktop.get_state"]
H -->|get_screenshot| I["flash_overlay.show_capture_flash"]
I -->|Tk daemon thread| J["Visual Confirmation Border"]
File Changes1. src/windows_mcp/desktop/flash_overlay.py
|
Code Review by Qodo
1. Tests missing type hints
|
| @pytest.fixture(autouse=True) | ||
| def _reset_active_overlay(): | ||
| """Each test starts and ends with no overlay registered.""" | ||
| with flash_overlay._lock: | ||
| flash_overlay._active_overlay = None | ||
| yield | ||
| with flash_overlay._lock: | ||
| ov = flash_overlay._active_overlay | ||
| flash_overlay._active_overlay = None | ||
| if ov is not None: | ||
| ov.stop_event.set() | ||
|
|
There was a problem hiding this comment.
1. Tests missing type hints 📘 Rule violation ⚙ Maintainability
New test functions/fixtures are added without parameter and return type annotations. This violates the requirement for fully type-annotated function signatures and reduces static analysis/IDE support.
Agent Prompt
## Issue description
Newly-added test functions/fixtures do not include type annotations for parameters and return types.
## Issue Context
Compliance requires all added/modified function signatures to be fully type-annotated.
## Fix Focus Areas
- tests/test_flash_overlay.py[17-117]
- tests/test_window_resolver.py[18-148]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Replace the single transparent-canvas overlay with stacked thin Toplevel "strip" windows along each side of the rect (top, bottom, left, right). Each strip is a solid orange-red Toplevel with its own -alpha, so per-layer fade is reliable — the previous -alpha + -transparentcolor combination renders inconsistently on Windows. Eight layers per side with quadratic alpha falloff: layer 0 sits on the rect edge at full opacity, outer layers radiate further out (or inward, for full-screen) with progressively lower alpha to produce a soft halo. Time-based modulation still gives a quick fade-in, hold, and slow fade-out for region captures, and a bell-curve fade for full-screen. Tests: tests/test_flash_overlay.py adds TestBuildStripDefs covering strip placement (region outward, full-screen inward), alpha falloff, and the no-room edge case.
ba7cc1e to
1ce1f5d
Compare
The previous strip-Toplevels rewrite hung Tk's mainloop when launched on a non-main thread (creating ~32 Toplevels with overrideredirect froze the loop), so the flash never appeared even though the thread was alive. Switch back to a single transparent Tk window with a Canvas, but produce the halo by drawing concentric border rectangles whose RGB fades from full orange-red toward pure black. The canvas's transparent-colour key is set to pure black, so dim outer layers genuinely become transparent — no -alpha + -transparentcolor combo (which Windows renders unreliably) is needed. The time fade is done by re-rendering each frame with a scaled intensity, so the glow fades in/out smoothly. 12 layers, ~4 px outer halo for region captures, inner glow inset by 4 px from the monitor edge for full-screen captures. Tests: replace TestBuildStripDefs with TestLayerColor covering full intensity, half-intensity blend, and the zero-intensity safeguard (never returns pure black, which would punch through the transparent key).
Two prior rewrites failed in different ways on the user's machine: 1. The single-window transparent canvas approach (-transparentcolor + per-frame redraw) rendered nothing — some Windows configurations refuse to map the layered window when -transparentcolor is set. 2. The multi-layer Toplevel-strip approach hung Tk's mainloop when ~8 or more Toplevels were created back-to-back on a non-main thread (overrideredirect + alpha + topmost). Reduce the halo to a single ring of 4 thick (8 px) opaque Toplevel strips per rect — the only configuration confirmed to actually render and complete cleanly. No transparentcolor, no glow gradient — just a solid orange-red border that fades in (~15% of duration), holds, and fades out (~35%). Drop the test that asserted falling alpha across layers since there is now only one layer; the remaining strip-placement tests still exercise the geometry. Logs the strip count at INFO level so it's possible to verify the overlay actually fired from server logs.
Drop Tk entirely — neither -transparentcolor (rendered nothing on some Windows configs) nor multi-Toplevel strips (hangs Tk's mainloop on a non-main thread once ~6+ Toplevels are created back-to-back) delivered a usable result. The new implementation creates a Win32 layered window (WS_EX_LAYERED | WS_EX_TRANSPARENT | WS_EX_TOPMOST | WS_EX_TOOLWINDOW | WS_EX_NOACTIVATE) and feeds UpdateLayeredWindow a 32-bit BGRA top-down DIB section with premultiplied alpha. The halo itself is rendered with PIL: solid orange-red ring plus a Gaussian-blurred copy composited underneath, giving a real soft glow that fades from the rect edge outward. Time fade is a linear scale of the alpha channel re-pushed each frame (~30 ms cadence), quantised to 32 levels to skip redundant uploads. ctypes argtypes/restypes are set explicitly on every Win32 entry point used because the pointer-sized handle/LRESULT defaults overflow on x64. Tests: replace TestBuildStripDefs with TestIntensityCurve (verifies the bell-curve / hold-then-fade envelopes) and TestPremultipliedBgra (verifies BGRA byte order, premultiplication, and alpha scaling for the per-frame intensity multiplier).
The orange ring was nested inward inside the captured rect, so for window- targeted screenshots (especially borderless Tauri / WebView2 apps where the DWM extended frame matches the content edge) the halo visibly overlapped the window content instead of reading as a surround. Switch the ring direction to outward by default — it now grows into the existing 42px margin around the rect — and keep the inward nesting only for the full-screen inner halo via outward=False. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lity User feedback: the glow was easy to miss for window/region captures — the 4px band against a 14px blur radius wasn't bright enough to register when the bulk of the halo sits outside the captured rect on desktop background. Double the sharp-ring thickness to 8px and bump the total duration to 3.5s so the user has time to spot it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1ce1f5d to
f4a8ff5
Compare
Two review asks on PR CursorTouch#232: JezaChen — desktop/service.py is already large; move the flash setup code into show_capture_flash. Done: show_capture_flash now takes a single uia.Rect (or None for full-desktop) and resolves rects / enumerates monitors internally. get_screenshot collapses to a one-liner. Qodo — show_capture_flash overwrote _active_overlay without signalling the prior one, so a follow-up call would orphan the running overlay and cancel_active_flash() could no longer reach it. Fixed with an atomic swap: save prev under _lock, install new, then signal prev outside the lock so the prior overlay tears down without blocking the new caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rder The two new full-screen tests monkeypatched ``sys.modules['windows_mcp.uia']`` but that is bypassed once another test in the suite imports the real uia — ``import windows_mcp.uia as uia`` then resolves via the cached parent-package attribute rather than sys.modules. Patch ``GetMonitorsRect`` on the real module instead so the override holds regardless of import order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds optional window_name (fuzzy title match) and window_pid (exact process id) parameters to the Screenshot and Snapshot tools so an agent can capture a single window's bounding rectangle without doing a full-desktop screenshot or guessing a display index. The targeted window is brought to the foreground first by default; pass focus_window=False to skip that. window_name/window_pid and display are mutually exclusive. Implementation: - New module desktop/window_resolver.py: enumerate_visible_windows, resolve_window (PID/exact + name/fuzzy), and get_window_rect using DwmGetWindowAttribute(DWMWA_EXTENDED_FRAME_BOUNDS) with a GetWindowRect fallback so DWM drop-shadow doesn't pollute the rect. - Desktop.resolve_window_capture_rect ties resolver + focus + rect read together and reuses the existing bring_window_to_top logic. - Desktop.get_state now accepts a capture_rect override that takes precedence over display_indices (and rejects passing both). - capture_desktop_state (snapshot helper) plumbs window_name / window_pid / focus_window through, validates mutual exclusion with display, and surfaces the resolved title in the Screenshot/Snapshot metadata as "Target Window: ...". Tests: tests/test_window_resolver.py — 12 cases covering enumeration filtering, PID-vs-name precedence, fuzzy cutoff, and the DWM/ GetWindowRect fallback chain.
When window_name/window_pid is supplied with focus_window=False, the resolver now refuses unless the target is actually the foreground window. Otherwise the screenshot would just capture whatever happened to be on top of it, which is misleading. - focus_window=True (default): after bring_window_to_top, verify the window became foreground and log a warning if it didn't (e.g. when Windows blocks SetForegroundWindow for an elevated/UIPI target). - focus_window=False: raise WindowNotFoundError with a clear message if the target is not foreground, in addition to the existing minimized check. Adds is_foreground(hwnd) helper in window_resolver.py and three tests for it.
…ndow is blocked bring_window_to_top relies on SetForegroundWindow, which Windows silently rejects when the calling process didn't own the last input event — even after the AttachThreadInput dance. The targeted window is raised in z-order but the active foreground window stays put, so the screenshot grabs whichever app the user happens to be looking at instead. When the post-focus is_foreground check fails, retry once with SwitchToThisWindow (the undocumented Win32 API the shell uses for Alt-Tab), which bypasses the foreground lock without injecting keyboard input. If that still doesn't work, log the existing warning. Tests cover the new force_foreground helper: it calls SwitchToThisWindow with (hwnd, True) and silently swallows OSError from misbehaving drivers.
Qodo review on PR CursorTouch#233 pointed out that resolve_window_capture_rect relied on catching exceptions from bring_window_to_top, but bring_window_to_top swallows its own exceptions and only logs — so the try/except was never a reliable failure signal. f4a8ff5 added an explicit is_foreground post-condition check plus a SwitchToThisWindow fallback, but on final failure it only warned and then captured whatever was on top anyway — exactly the silent-wrong- content case Qodo flagged. When focus_window=True and both bring_window_to_top + force_foreground fail to bring the target forward, raise WindowNotFoundError with a message that tells the user to focus it manually or pass focus_window=False. focus_window=False still bypasses this path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f4a8ff5 to
eba890d
Compare
Summary
Adds
window_name(fuzzy title match) andwindow_pid(exact process id) parameters to theScreenshotandSnapshottools. When either is set, the tool resolves that window, brings it to the foreground (unlessfocus_window=False), reads its bounding rectangle, and captures only that area.window_name/window_pidanddisplayare mutually exclusive.Why
Right now there is no way to ask the server for "just this window". The only capture controls are
display=[N](whole monitor) or full virtual screen. Agents that want a specific app's screenshot have to guess the right display, capture the whole thing, and trust the LLM to crop visually. Withwindow_name/window_pidthe server does the lookup and returns just the relevant pixels. Combined with #232's flash overlay this also gives the user a clear visual confirmation of which window was just sampled.Implementation
src/windows_mcp/desktop/window_resolver.pyenumerate_visible_windows()—EnumWindowsfiltered to visible, valid HWNDs.resolve_window(name=..., pid=..., windows=...)— PID match takes precedence; name match usesfuzzywuzzy.process.extractOnewith a 70 score cutoff (consistent withDesktop._find_window_by_name). Untitled PID-matched windows are tolerated; titled ones are preferred.get_window_rect(hwnd)—DwmGetWindowAttribute(DWMWA_EXTENDED_FRAME_BOUNDS)first so the DWM drop-shadow on Aero windows doesn't inflate the rect;GetWindowRectfallback.Desktop.resolve_window_capture_rect(name=..., pid=..., focus=True)glues resolver + focus together, reusing the existingbring_window_to_toplogic, and returns(uia.Rect, title). Sleeps 50 ms after focus so DWM has time to repaint before the rect is read.Desktop.get_state(...)gains an optionalcapture_rect: uia.Rect | Noneparameter that overridesdisplay_indicesand is rejected if both are supplied.capture_desktop_stateplumbswindow_name/window_pid/focus_windowthrough, validates the mutual exclusion, and surfaces the resolved title in the response metadata asTarget Window: <name>.ScreenshotandSnapshottool descriptions updated to document the new parameters.Tests
tests/test_window_resolver.py— 12 cases (enumeration filtering, PID prefers titled, PID falls back to first match, PID-not-found error, fuzzy name match, fuzzy cutoff rejection, untitled-only error, PID-takes-precedence-over-name, DWM success path, DWM fallback toGetWindowRect, DWM-raises fallback).test_snapshot_display_filter.pycontinues to pass — the newcapture_rectparameter is purely additive and thedisplay_indices-based flow is unchanged.Test plan
Screenshot(window_name="cotire")returns just that windowScreenshot(window_pid=12345)returns just that PID's titled windowScreenshot(window_name="...", focus_window=False)returns the window without focusing it (errors clearly if minimized)Screenshot(window_name="...", display=[0])rejects with a clear errorSnapshot(use_vision=True, window_name="...")returns the window with the UI tree filtered to that regionpytest tests/test_window_resolver.pypasses (12/12)🤖 Generated with Claude Code