Skip to content

fix(ci): resolve ambiguous repo match instead of erroring#615

Merged
wesm merged 6 commits intoroborev-dev:mainfrom
axonstone:fix/ci-poller-ambiguous-repo-match
Apr 2, 2026
Merged

fix(ci): resolve ambiguous repo match instead of erroring#615
wesm merged 6 commits intoroborev-dev:mainfrom
axonstone:fix/ci-poller-ambiguous-repo-match

Conversation

@axonstone
Copy link
Copy Markdown
Contributor

Summary

  • When multiple repos share the same git remote identity (e.g. an auto-clone AND a user checkout), GetRepoByIdentityCaseInsensitive and findRepoByPartialIdentity now disambiguate instead of returning an error.
  • Auto-clones ({DataDir}/clones/) are preferred since CI manages them and they won't have dirty working tree state. If no auto-clone exists among the matches, the most recently created repo is returned.
  • This fixes the CI poller breaking on every poll cycle when users have their own local checkout registered alongside the auto-clone — the exact error seen in production: ambiguous repo match for "towardsthecloud/cloudburn-nextjs": multiple repos found with identity "git@github.com:towardsthecloud/cloudburn-nextjs.git"

Diagram

flowchart TD
    A[GetRepoByIdentityCaseInsensitive] --> B{How many matches?}
    B -->|0| C[Return nil]
    B -->|1| D[Return match]
    B -->|>1| E[PreferAutoClone]
    E --> F{Any match under\nDataDir/clones/?}
    F -->|Yes| G[Return auto-clone]
    F -->|No| H[Return most recently\ncreated repo]
Loading

When multiple repos share the same git remote identity (e.g. an
auto-clone AND a user checkout), GetRepoByIdentityCaseInsensitive
and findRepoByPartialIdentity now disambiguate instead of returning
an error. Auto-clones ({DataDir}/clones/) are preferred since CI
manages them and they won't have dirty working tree state. If no
auto-clone exists, the most recently created repo is returned.

This fixes the CI poller breaking on every poll cycle when users
have their own local checkout registered alongside the auto-clone.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 1, 2026

roborev: Combined Review (d52f019)

Verdict: Changes introduce one high-severity repo-confusion risk, one high-severity selection bug, and one medium-severity lookup regression.

High

  • Ambiguous repo identities can now map CI events to the wrong local checkout, which is a repository-confusion issue. When multiple local repos share the same owner/name suffix across different hosts/remotes, the new auto-resolution path treats them as interchangeable and lets PreferAutoClone choose one by clone location or recency instead of rejecting the ambiguity. Because the CI poller derives the repo identity from external CI/GitHub data, this can attach polling and review activity to the wrong source tree, potentially using the wrong repo-scoped config/token or publishing output for the wrong repo.
    Location: internal/daemon/ci_poller.go#L1036, internal/storage/sync.go#L251

  • Newest-repo fallback is effectively broken because CreatedAt is never populated on matched repos. The code scans createdAt from SQLite but never parses or assigns it to Repo.CreatedAt, so PreferAutoClone cannot reliably select the most recent checkout and will fall back to an arbitrary match ordering instead. That makes the new ambiguity-resolution behavior incorrect even in the non-security case it is trying to handle.
    Location: internal/storage/sync.go#L242

Medium

  • Sync placeholder rows are still eligible during preferred-repo selection, which can mask a valid local checkout. If a placeholder row has root_path == identity and shares an identity with a real checkout, PreferAutoClone can select the placeholder first, and findLocalRepo only skips that single returned row. The result is CI lookup failing or returning a non-filesystem path even though a usable local repo exists.
    Location: internal/storage/sync.go, internal/daemon/ci_poller.go

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- Prevent cross-host repo confusion in findRepoByPartialIdentity:
  only auto-resolve when all matches share the same normalized
  identity. Different remotes (e.g. github.com vs ghe.corp.com) are
  genuinely ambiguous and still return an error.

- Fix CreatedAt never populated in findRepoByPartialIdentity by
  selecting and parsing created_at from the query, so the
  newest-repo fallback in PreferAutoClone works correctly.

- Add defensive sync-placeholder filtering in PreferAutoClone
  itself (root_path == identity rows are skipped) so it cannot
  select a non-filesystem path even if a caller omits the check.
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 2, 2026

roborev: Combined Review (e654484)

Summary verdict: PR has 1 High and 1 Medium issue that should be addressed before merge.

High

  • CreatedAt is never populated, so "prefer newest" selection silently fails
    • Location: internal/storage/sync.go#L236
    • Problem: GetRepoByIdentityCaseInsensitive scans createdAt into a local string but never parses and assigns it to r.CreatedAt. As a result, matched repos keep the zero time.Time, and PreferAutoClone cannot actually prefer the newest repo when falling back. In practice this makes the selection degrade to the first returned row instead of the most recent repo.
    • Fix: Parse the scanned createdAt value and assign it to r.CreatedAt before appending to matches, consistent with the parsing logic used in internal/daemon/ci_poller.go.

Medium

  • Partial identity matching still treats SSH and HTTPS remotes for the same repo as different
    • Location: internal/daemon/ci_poller.go#L1044
    • Problem: The new disambiguation only lowercases the raw identity and strips .git before comparing candidates. That does not collapse equivalent remotes such as git@host:owner/repo.git and https://host/owner/repo, so the same repository can still remain incorrectly "ambiguous" on non-GitHub hosts.
    • Fix: Normalize identities to a canonical host-plus-path form before comparison, or reuse a shared remote-normalization helper that treats SSH/HTTPS variants as the same repo.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 4 commits April 2, 2026 07:08
The partial-identity disambiguation compared full identity strings,
so SSH (host:owner/repo) and HTTPS (https://host/owner/repo) remotes
for the same host were treated as different repos. Extract a
scheme-independent host/path key before comparing.

Also fix the SameHostResolved test to actually use mixed URL schemes
(SCP-style vs HTTPS) instead of two identical SSH URLs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip optional user@ prefix in normalizeIdentityKey so that
git@host:owner/repo identities (as actually persisted) match their
HTTPS counterparts. Use parsed.Host instead of parsed.Hostname() to
preserve explicit ports and avoid false matches across different
services on the same hostname.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip well-known default ports (443 for HTTPS, 22 for SSH, etc.)
so that host:443/path and host/path normalize to the same key,
while non-default ports like :8443 are preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always go through parsed.Hostname() to strip brackets uniformly,
then re-add them for IPv6 literals and append only non-default
ports. This ensures https://[::1]/repo and https://[::1]:443/repo
produce the same key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 2, 2026

roborev: Combined Review (cfd506c)

PR has a High severity regression that breaks the new ambiguous-repo fallback behavior; otherwise the reviewed changes look clean.

High

  • internal/storage/sync.go around line 245: GetRepoByIdentityCaseInsensitive scans createdAt from SQLite but never parses or assigns it to r.CreatedAt. That leaves CreatedAt at the zero value for every match, so the fallback in PreferAutoClone cannot actually choose the most recently created repository and will effectively return the first match instead. This breaks the intended disambiguation behavior for ambiguous identities.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Apr 2, 2026

False positive. I'll merge this once CI passes

@wesm wesm merged commit 2c9749e into roborev-dev:main Apr 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants