fix(node): dedupe mirror and canonical repo rows on list surfaces (#6) by beardthelion · Pull Request #73 · Gitlawb/node

beardthelion · 2026-06-20T21:22:10Z

Summary

Profile and repo-list surfaces rendered the same logical repo twice when a short-owner peer mirror row and the canonical did:key: row both existed. This collapses them to one card on the surfaces that were missing the dedup.

Motivation & context

Closes #6

node.gitlawb.com showed two nipmod cards on a profile: one from the peer mirror row (owner_did = "z6Mk…", description "mirrored from peer") and one from the canonical row (owner_did = "did:key:z6Mk…"). The paged repo list already deduped these in SQL, but the non-paged GET /api/v1/repos legacy path and list_federated_repos returned every matching row, so both showed up.

Kind of change

What changed

Crate touched: gitlawb-node.

Added dedupe_canonical_repos in api/repos.rs: groups rows by (normalized owner, name) (the key segment after the last :, so did:key:z6Mk… and the bare z6Mk… mirror row collapse together), keeps the canonical row (non-mirror beats "mirrored from peer", ties broken by earliest created_at), and carries the group's most recent updated_at onto the survivor so a gossip push that only touched the mirror row still floats the repo to the top. This matches the existing SQL dedup in Db::list_all_repos_paged.
Applied it at the two non-paged surfaces: the legacy list_repos fallback and list_federated_repos. As a side effect the legacy path's X-Total-Count now counts logical repos rather than raw rows, consistent with the paged path.
Added a repos::tests module covering canonical-wins, distinct-repos-preserved, same-owner-different-repo, and the mirror tie-break.

How a reviewer can verify

cargo test --bin gitlawb-node repos::tests
# Against a node that has both a mirror and a canonical row for one repo:
curl -fsSL 'http://<node>/api/v1/repos?owner=z6Mkwbud...'  # one record, owner_did = did:key:..., real description

Before you request review

Scope is one logical change; no unrelated churn
cargo test --workspace passes locally
New behavior is covered by tests (required for fixes)
cargo fmt --all and cargo clippy --workspace --all-targets -- -D warnings are clean
Commit titles use Conventional Commits (feat(...), fix(...), docs(...))
Docs / .env.example updated if behavior or config changed (or N/A)
Checked existing PRs so this isn't a duplicate

Notes for reviewers

The dedup logic now lives in two places: the SQL DISTINCT ON in Db::list_all_repos_paged and this Rust helper for the non-paged surfaces. They use the same preference rules and the helper's doc comment flags that they must stay in sync. Consolidating both behind one path is possible later but would change the legacy "return all rows" contract that peer/CLI callers rely on, so I kept it out of scope.

Summary by CodeRabbit

Bug Fixes
- Repository listings now properly deduplicate canonical and mirrored entries, so each logical repository shows once.
- Total counts and “most recent activity” ordering now reflect the deduplicated view (with deterministic tie-breaking).
API Updates
- GraphQL repository results use the deduplicated dataset.
- The /api/v1/stats repos metric now counts logical repositories consistently with deduplication.
Tests
- Added coverage for canonical-vs-mirror selection, deterministic ordering, did-key-aware grouping, and empty-table behavior.

coderabbitai · 2026-06-20T21:22:18Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a466227d-c4bd-406d-b497-e9146fc1a63e

📥 Commits

Reviewing files that changed from the base of the PR and between 5721791 and 9b9b120.

📒 Files selected for processing (2)

crates/gitlawb-node/src/api/repos.rs
crates/gitlawb-node/src/db/mod.rs

🚧 Files skipped from review as they are similar to previous changes (1)

crates/gitlawb-node/src/api/repos.rs

📝 Walkthrough

Walkthrough

Repo listing and counting now deduplicate mirror and canonical rows into one logical repository. The database layer adds shared deduped list/count queries, and the API, GraphQL, stats, and tests use the deduped results.

Changes

Canonical repo deduplication

Layer / File(s)	Summary
DB dedup query and methods `crates/gitlawb-node/src/db/mod.rs`	Defines shared SQL for collapsing canonical and mirror rows, switches paged listing to that SQL, adds unpaged deduped list/count methods, and adds database tests for deduplication and count behavior.
API repo list deduplication `crates/gitlawb-node/src/api/repos.rs`	Updates `list_repos` to deduplicate owned repo rows before building the response and total count, deduplicates the local repo set in `list_federated_repos`, and adds unit tests for selection and tie-breaking rules.
GraphQL, stats, and integration tests `crates/gitlawb-node/src/graphql/query.rs`, `crates/gitlawb-node/src/server.rs`, `crates/gitlawb-node/src/test_support.rs`	Routes GraphQL repos and stats repo counts through the deduped DB methods and adds integration tests that assert the deduped logical repo count on both surfaces.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

kevincodex1

Poem

🐇 I hopped through rows both twin and true,
Found one clear repo shining through.
Canonical first, the mirror tucked away,
Now one neat list greets the day.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately summarizes the main change: deduplicating mirror and canonical repo rows on list surfaces.
Description check	✅ Passed	The description follows the template and covers summary, motivation, change details, verification, and reviewer notes.
Linked Issues check	✅ Passed	The PR addresses `#6` by deduplicating mirrored and canonical repo rows, preferring canonical metadata, and covering the affected list/profile surfaces.
Out of Scope Changes check	✅ Passed	The DB, GraphQL, stats, and migration changes are supporting pieces of the same deduplication fix, not unrelated churn.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/dedup-mirror-rows-canonical-owner

_{Comment @coderabbitai help to get the list of available commands.}

beardthelion · 2026-06-21T17:34:09Z

@coderabbitai review

coderabbitai · 2026-06-21T17:34:17Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

beardthelion · 2026-06-21T18:00:24Z

@coderabbitai full review

coderabbitai · 2026-06-21T18:00:30Z

✅ Action performed

Full review finished.

jatmn

Findings

1. GraphQL `repos` query still returns duplicate mirror/canonical rows

The PR dedups GET /api/v1/repos and /api/v1/repos/federated, but the GraphQL repos query at crates/gitlawb-node/src/graphql/query.rs:12-28 still calls db.list_all_repos() without any dedup. Since that method returns every raw row, a client using /graphql will still see the same logical repo twice when both a mirror row and a canonical row exist.

Evidence from the checkout:

graphql/query.rs:15-17 calls db.list_all_repos() and maps the result directly to RepoType.
list_all_repos() in db/mod.rs:826-836 selects all rows from repos with no dedup.
The REST list path now calls dedupe_canonical_repos on the output of list_all_repos_with_stars() (repos.rs:173 and :1005).

Because the linked issue (#6) asks for "profile and repo-list surfaces" to show one logical repo, and the GraphQL endpoint is a repo-list surface, the PR does not fully close the issue as claimed.

Recommended action: Apply the same dedup to the GraphQL repos resolver, or add a dedicated Db::list_all_repos_deduped() / list_all_repos_with_stars_deduped() method and use it consistently across REST, GraphQL, and stats.

2. `stats` endpoint inflates repo count when mirror rows exist

/api/v1/stats (server.rs:435-450) uses db.list_all_repos().await and returns r.len() as i64. Because this path is not deduped, a node with both a canonical and a mirror row for the same repo counts them as two repos. This value is displayed by gl sync (crates/gl/src/sync.rs:49-55), so it is user-visible.

This is the same underlying issue as the list-surface bug, but applied to a count. The PR changed the legacy X-Total-Count to count logical repos, which makes the lack of consistency with /api/v1/stats more noticeable.

Recommended action: Either reuse the dedup helper for the stats count, or move the canonical-dedup logic into the DB layer so all callers get consistent counts automatically.

3. Mirror detection relies on a user-settable description string

dedupe_canonical_repos treats any row whose description == "mirrored from peer" as a mirror. The same string is used in the SQL list_all_repos_paged query. Because description is user-provided at repo creation, a canonical repo created with that exact description would be deprioritized against another canonical row, and a mirror row with a different description would be treated as canonical. This is a pre-existing fragility, but the PR duplicates it in Rust code rather than using a dedicated marker (e.g., a boolean column, the id format, or the machine_id/disk_path pattern used by upsert_mirror_repo).

Recommended action: Consider adding an explicit is_mirror column or other non-user-visible marker and update both the SQL and Rust dedup paths to use it.

The non-paged GET /api/v1/repos legacy path and list_federated_repos returned both the short-owner peer mirror row and the canonical did:key row for the same logical repo, so profiles rendered the repo twice. Only the paged path collapsed them, in SQL. Add a dedupe_canonical_repos helper that groups by (normalized owner, name), keeps the canonical non-mirror row (tie broken by earliest created_at), and carries the group's latest updated_at onto the survivor, matching the paged SQL dedup. Apply it at both non-paged surfaces and cover it with unit tests.

…ion on the repo id Addresses jatmn's review on #73 (dedup not applied on every reader path; mirror detection keyed on a user-settable string). - GraphQL repos and /api/v1/stats now collapse mirror+canonical rows, via new Db::list_all_repos_deduped and count_repos_deduped that share the DISTINCT ON CTE with list_all_repos_paged so the dedup rule cannot drift. - Mirror detection keys on the structural slash-form id (written only by upsert_mirror_repo) instead of the description == 'mirrored from peer' string, in both the SQL paths and dedupe_canonical_repos. - Deterministic survivor on a full created_at tie (id ASC) in both implementations. - Legacy REST list and federated keep their method-scoped did_matches owner filter in Rust; it does not compose with the method-blind SQL group key, so those paths intentionally stay on the Rust helper. - Adds sqlx and unit tests for the new surfaces, the structural marker, and the tiebreak.

…p cases Follow-ups from code review on the dedup change: - list_all_repos_deduped/count_repos_deduped: mirror-only group survives, empty table returns empty/0, and count_repos_deduped equals the deduped list length (guards the two independent SQL queries against grouping-key drift). - Document list_all_repos as the raw, non-deduped enumeration path (object lookup only), so it is not mistaken for a listing-surface method.

beardthelion · 2026-06-24T14:40:11Z

Thanks, all three are addressed. Rebased onto main first (the conflict is gone; mergeable now).

1 & 2 — GraphQL repos and /api/v1/stats no longer show raw rows. Both now go through new DB methods, list_all_repos_deduped and count_repos_deduped, that reuse the same DISTINCT ON (split_part(owner_did,':',-1), name) selection as list_all_repos_paged (factored into a shared DEDUP_CTE const so the three can't drift). count_repos_deduped uses the COUNT(DISTINCT (split_part(...), name)) idiom already in the paged empty-page path. #[sqlx::test]s cover both surfaces, plus mirror-only, empty-table, and a count-equals-list-length guard.

3 — mirror detection no longer keys on the description. Both the SQL paths and dedupe_canonical_repos now classify a mirror by its slash-form id ({owner_short}/{name}), which upsert_mirror_repo is the only writer of; canonical rows use UUID ids and repo names are sanitized, so no other row can carry a slash. A test seeds a canonical row whose description is literally "mirrored from peer" and confirms it still wins. I went with the structural id marker rather than an is_mirror column to keep this migration-free; happy to add the column as a follow-up if you'd prefer the explicit schema signal.

While there I made the dedup survivor deterministic on a full (mirror-status, created_at) tie via an id ASC backstop, in both the SQL and Rust paths.

One thing I left out of scope: GraphQL repos and /api/v1/stats don't filter on is_public (the previous list_all_repos they called didn't either, so this PR doesn't change that). If those should be public-only surfaces it's a separate visibility fix; say the word and I'll open an issue.

The legacy non-paged list and federated paths keep their existing did_matches owner filter in Rust rather than moving to SQL: did_matches is DID-method-scoped (it won't match did:key:z6X against did:gitlawb:z6X), and the SQL group key is method-blind, so deduping in SQL before that filter could drop a repo from its own owner's listing. Leaving those paths as-is avoids that.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/gitlawb-node/src/db/mod.rs (1)
906-917: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the SQL owner key did:key-aware instead of suffix-based.

split_part(owner_did, ':', -1) collapses any DID method with the same last segment, and the owner filter is asymmetric: owner=did:key:z6X does not include the bare mirror row z6X, so paged results can disagree with the legacy did_matches path and lose the mirror’s max updated_at. Normalize only did:key:<id> to <id>; keep other DID methods exact.
Suggested shape
- SELECT DISTINCT ON (split_part(owner_did, ':', -1), name)
+ SELECT DISTINCT ON (
+     CASE WHEN owner_did LIKE 'did:key:%' THEN substring(owner_did from 9) ELSE owner_did END,
+     name
+ )
...
- PARTITION BY split_part(owner_did, ':', -1), name
+ PARTITION BY
+     CASE WHEN owner_did LIKE 'did:key:%' THEN substring(owner_did from 9) ELSE owner_did END,
+     name
...
- WHERE ($1::text IS NULL OR owner_did = $1 OR owner_did LIKE '%:' || $1)
+ WHERE (
+     $1::text IS NULL
+     OR owner_did = $1
+     OR ($1 LIKE 'did:key:%' AND owner_did = substring($1 from 9))
+     OR (owner_did LIKE 'did:key:%' AND $1 = substring(owner_did from 9))
+ )
Apply the same normalized key to count_repos_deduped and the empty-page count fallback.
Also applies to: 1011-1014
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/db/mod.rs` around lines 906 - 917, The deduplication
key in the repos query is still suffix-based, so it can merge unrelated DID
methods and miss the bare mirror row for did:key owners. Update the
normalization in the repos selection logic to treat only did:key:<id> as <id>
while leaving other owner_did values exact, and apply the same normalized key
consistently in count_repos_deduped and the empty-page count fallback so paging
and counts stay aligned with did_matches.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 1424-1431: The dedupe key normalization in the rows loop is too
broad because it strips everything after the last colon, which can collide
different DID methods; update the key-building logic around rec.owner_did to
match did_matches behavior by only normalizing did:key:<id> to <id> and leaving
other DID methods unchanged. Use the existing owner comparison semantics as the
reference point, and add a regression test covering did:key:z6Same versus
did:gitlawb:z6Same to verify they remain distinct.

---

Outside diff comments:
In `@crates/gitlawb-node/src/db/mod.rs`:
- Around line 906-917: The deduplication key in the repos query is still
suffix-based, so it can merge unrelated DID methods and miss the bare mirror row
for did:key owners. Update the normalization in the repos selection logic to
treat only did:key:<id> as <id> while leaving other owner_did values exact, and
apply the same normalized key consistently in count_repos_deduped and the
empty-page count fallback so paging and counts stay aligned with did_matches.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e0df3ff1-9b66-4958-8e78-0a70f4c09c04

📥 Commits

Reviewing files that changed from the base of the PR and between dcbad62 and 8e8b74b.

📒 Files selected for processing (5)

crates/gitlawb-node/src/api/repos.rs
crates/gitlawb-node/src/db/mod.rs
crates/gitlawb-node/src/graphql/query.rs
crates/gitlawb-node/src/server.rs
crates/gitlawb-node/src/test_support.rs

jatmn

Findings

1. Required: `cargo fmt` is not clean

Severity: Required
Location: crates/gitlawb-node/src/api/repos.rs, crates/gitlawb-node/src/db/mod.rs, crates/gitlawb-node/src/test_support.rs

cargo fmt --all -- --check reports diffs in the PR’s changed code (long single-line record(...) calls, over-length rec(...) signature, assert_eq! macro invocations that should be multi-line, and a few comment alignments). The PR checklist claims cargo fmt --all is clean, but it is not. This must be fixed before merge.

2. Positive: Logic is consistent across SQL and Rust dedup paths

Severity: No issue

I verified the deduplication rules line up between the new Rust helper and the shared SQL CTE:

Grouping key: split_part(owner_did, ':', -1) / owner_did.rsplit(':').next().
Mirror marker: position('/' in id) > 0 / r.id.contains('/').
Survivor preference: canonical (no slash) over mirror.
Tie-break: created_at ASC, id ASC in both SQL and Rust.
Activity timestamp: the survivor inherits the group’s max updated_at (SQL window function in DEDUP_CTE, latest map in Rust).

The GraphQL and stats surfaces now call the deduped methods, and the only remaining raw consumer is the IPFS object scan, which is intentionally documented as the non-listing path.

3. Positive: Tests cover the important edge cases

Severity: No issue

The new tests exercise:

Canonical wins over mirror.
Distinct repos are preserved.
Same owner, different repo does not collapse.
Tie-breaking by earliest created_at and then id ASC.
Mirror description on a canonical row does not misclassify it.
Mirror-only group survives.
Empty table returns empty / zero.
count_repos_deduped matches list_all_repos_deduped length.
Real upsert_mirror_repo row shape is classified correctly.
GraphQL and REST stats integration.

Unit tests in api::repos::tests pass.

4. Optional / pre-existing: paged owner filter still only handles short-form owner

Severity: Optional observation (not introduced by this PR)

In list_all_repos_paged, the owner filter is:

WHERE ($1::text IS NULL OR owner_did = $1 OR owner_did LIKE '%:' || $1)

If a caller passes the full did:key:z6Mk… form, the LIKE pattern becomes %:did:key:z6Mk…, which will not match the bare z6Mk… mirror row. The non-paged legacy path filters in Rust using did_matches, which handles both forms correctly. Fixing the paged SQL filter to handle full DID is out of scope here, but worth a follow-up to keep the two list paths consistent.

beardthelion · 2026-06-24T18:31:20Z

Fixed in 5721791. Ran cargo fmt --all over the three files (repos.rs, db/mod.rs, test_support.rs) — the multi-line record/rec calls, single-line assert_eq!s, and the create_repo(seed_repo(...)) reflows. cargo fmt --all -- --check is clean now and clippy is warning-free. Finding 4 (paged owner filter on full DID) is pre-existing and out of scope here; I'll leave it for a follow-up.

jatmn

Findings

1. Deduplication key is method-blind and can collapse distinct DID methods (Major)

Severity: Major / correctness

Location:

crates/gitlawb-node/src/db/mod.rs — DEDUP_CTE (lines 905–925) and count_repos_deduped (lines 1011–1018)
crates/gitlawb-node/src/api/repos.rs — dedupe_canonical_repos (lines 1384–1466)

Issue: The dedup grouping key is the last : segment of owner_did:

split_part(owner_did, ':', -1)

rec.owner_did.rsplit(':').next().unwrap_or(&rec.owner_did)

This is method-blind. A repo owned by did:key:z6MkExample and a repo owned by did:gitlawb:z6MkExample (or any other DID method with the same trailing segment) and having the same name will be collapsed into one logical repo. The codebase already recognizes this risk and explicitly guards against it in crates/gitlawb-node/src/api/mod.rs:

/// Match a presented DID against a stored DID ... never let a bare id match across methods —
/// `did:web` / `did:gitlawb` share the base58 space with `did:key`, so a
/// trailing-segment compare would treat `did:key:X` and `did:gitlawb:X` as equal.
pub(crate) fn did_matches(a: &str, b: &str) -> bool { ... }

The new dedup logic does exactly the trailing-segment comparison that did_matches warns against. It is true that repo creation currently only accepts did:key owners, but the database schema does not enforce that, and the project already treats cross-method collision as a real concern. The dedup key should match the project's own DID-matching semantics.

Recommended action: Make the dedup key did:key-aware (and bare-id-as-did:key). For example:

Treat did:key:<id> as <id>.
Treat a bare <id> (no colon) as <id>.
Leave any other did:<method>:<id> as the full string.

Apply the same normalization in DEDUP_CTE, count_repos_deduped, and dedupe_canonical_repos so the list, count, and legacy paths stay consistent.

2. Paged owner filter is inconsistent with the legacy path for full-DID owner queries (Optional / pre-existing)

Severity: Optional / observation

Location:

crates/gitlawb-node/src/db/mod.rs — list_all_repos_paged WHERE clause (line 917 and empty-page fallback line 972)
crates/gitlawb-node/src/api/repos.rs — legacy filter (lines 233–239)
crates/gitlawb-node/src/api/mod.rs — did_matches (lines 63–74)

Issue: The paged path filters in SQL:

WHERE ($1::text IS NULL OR owner_did = $1 OR owner_did LIKE '%:' || $1)

If a caller passes the full owner did:key:z6MkExample, the LIKE pattern becomes %:did:key:z6MkExample, which will not match the bare mirror row z6MkExample. The legacy path uses did_matches in Rust, which correctly matches both forms. This means a full-DID owner filter returns different results on the paged and legacy surfaces.

This is not introduced by the PR, but the PR touches the paged query and leaves the inconsistency in place. It is worth a follow-up to keep the two list paths aligned.

Recommended action: In the paged SQL, normalize the owner filter the same way the grouping key is normalized (ideally after fixing finding #1), or add an explicit OR branch for the bare mirror form when the filter is a did:key: value.

…ods don't collapse The dedup grouping key took the last ':' segment of owner_did (split_part / rsplit), so two repos owned by did:key:X and did:gitlawb:X with the same name collapsed into one logical repo on the list, paged, count, stats, and GraphQL surfaces. That is the exact cross-method collision the codebase already guards against in did_matches. Replace it with a did:key-aware key that strips a did:key: prefix only when the remainder is a bare id (no ':'), otherwise keeps the full DID, reproducing did_matches/key_id as an equivalence relation: did:key:X and a bare mirror X still collapse, while distinct methods never merge. Applied byte-identically across DEDUP_CTE (DISTINCT ON / PARTITION BY / ORDER BY), count_repos_deduped, the empty-page count fallback, and the in-memory dedupe_canonical_repos, so the SQL and Rust paths agree. The backing index lived in the already-released migration v1, so v1 is left untouched and a new migration v7 drops idx_repos_owner_short_name and builds idx_repos_owner_key_name on the matching expression; the CASE must stay byte-identical to the queries or Postgres stops using it. Tests cover both the in-memory and SQL paths: distinct methods stay separate, bare-id and did:key forms collapse, and the residual-colon guard keeps a malformed did:key:did:gitlawb:X distinct from the bare method DID. 216 pass.

beardthelion · 2026-06-24T21:49:44Z

Both addressed in 9b9b120.

1 (Major) - method-blind dedup key. Fixed. Replaced the split_part(owner_did, ':', -1) / rsplit(':') last-segment key with a did:key-aware key that strips a did:key: prefix only when the remainder has no :, and keeps the full DID otherwise:

SQL: CASE WHEN owner_did LIKE 'did:key:%' AND position(':' in substr(owner_did, 9)) = 0 THEN substr(owner_did, 9) ELSE owner_did END
Rust: owner_did.strip_prefix("did:key:").filter(|rest| !rest.contains(':'))

That reproduces did_matches/key_id as an equivalence relation, so did:key:z6Mk… and the bare mirror z6Mk… still collapse while did:key:z6Mk… and did:gitlawb:z6Mk… stay distinct. The residual-colon guard matches key_id's !ka.contains(':') check, so even a malformed did:key:did:gitlawb:X keeps its full form rather than collapsing onto the bare method DID.

Applied in every spot the key is read: DEDUP_CTE (DISTINCT ON / PARTITION BY / ORDER BY), count_repos_deduped, the empty-page COUNT(DISTINCT …) fallback in list_all_repos_paged, and dedupe_canonical_repos. The backing index lived in the already-released migration v1, so I left v1 untouched and added migration v7 to drop idx_repos_owner_short_name and build idx_repos_owner_key_name on the matching expression; the CASE is byte-identical across all of them so the planner still uses the index.

Tests cover cross-method distinctness and bare/did:key collapse on both the in-memory and SQL paths, plus the did:key:-wrapped-full-DID and empty-residual boundaries.

2 (Optional / pre-existing) - paged owner filter. Left as-is for now. The owner_did LIKE '%:' || $1 branch still trailing-matches, so a full did:key: owner filter won't hit the bare mirror row, and it now diverges from the grouping key on cross-method ids the same way you noted. It predates this PR and is out of scope for the under-withholding intent, so I'd rather not widen the diff here. Happy to file a follow-up to align the filter with the new key (and with did_matches on the legacy path) if you'd prefer that tracked.

jatmn

Thanks for the contribution. I do not see any actionable issues from my review.

@kevincodex1 LGTM

kevincodex1

LGTM

beardthelion added kind:bug Defect fix — wrong or unsafe behavior crate:node gitlawb-node — the serving node and REST API subsystem:replication Mirror, replica, and cross-node sync subsystem:api Node REST API request/response surface sev:low Cosmetic, cleanup, or nice-to-have labels Jun 22, 2026

jatmn requested changes Jun 24, 2026

View reviewed changes

beardthelion added 3 commits June 24, 2026 08:31

beardthelion force-pushed the fix/dedup-mirror-rows-canonical-owner branch from dcbad62 to 8e8b74b Compare June 24, 2026 14:37

beardthelion requested a review from jatmn June 24, 2026 14:38

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread crates/gitlawb-node/src/api/repos.rs Outdated

beardthelion mentioned this pull request Jun 24, 2026

Private repos (is_public=false) are enumerable via unauthenticated list/stats/GraphQL surfaces #97

Open

jatmn requested changes Jun 24, 2026

View reviewed changes

style(node): apply cargo fmt to dedup test code

5721791

beardthelion requested a review from jatmn June 24, 2026 18:31

jatmn requested changes Jun 24, 2026

View reviewed changes

beardthelion requested a review from jatmn June 24, 2026 21:49

jatmn approved these changes Jun 24, 2026

View reviewed changes

kevincodex1 approved these changes Jun 25, 2026

View reviewed changes

kevincodex1 merged commit 3e8e333 into main Jun 25, 2026
14 checks passed

beardthelion mentioned this pull request Jun 25, 2026

Paged repo-list owner filter misses bare-owner mirror rows when given a full DID #102

Open

Uh oh!

Conversation

beardthelion commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation & context

Kind of change

What changed

How a reviewer can verify

Before you request review

Notes for reviewers

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

beardthelion commented Jun 21, 2026

Uh oh!

coderabbitai Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beardthelion commented Jun 21, 2026

Uh oh!

coderabbitai Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jatmn left a comment

Choose a reason for hiding this comment

Findings

1. GraphQL repos query still returns duplicate mirror/canonical rows

2. stats endpoint inflates repo count when mirror rows exist

3. Mirror detection relies on a user-settable description string

Uh oh!

beardthelion commented Jun 24, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jatmn left a comment

Choose a reason for hiding this comment

Findings

1. Required: cargo fmt is not clean

2. Positive: Logic is consistent across SQL and Rust dedup paths

3. Positive: Tests cover the important edge cases

4. Optional / pre-existing: paged owner filter still only handles short-form owner

Uh oh!

beardthelion commented Jun 24, 2026

Uh oh!

jatmn left a comment

Choose a reason for hiding this comment

Findings

1. Deduplication key is method-blind and can collapse distinct DID methods (Major)

2. Paged owner filter is inconsistent with the legacy path for full-DID owner queries (Optional / pre-existing)

Uh oh!

beardthelion commented Jun 24, 2026

Uh oh!

jatmn left a comment

Choose a reason for hiding this comment

Uh oh!

kevincodex1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

beardthelion commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading

1. GraphQL `repos` query still returns duplicate mirror/canonical rows

2. `stats` endpoint inflates repo count when mirror rows exist

1. Required: `cargo fmt` is not clean