Skip to content

Summary 'Total' count inflated after removal of cross-file deduplication #2072

@afalhambra-hivemq

Description

@afalhambra-hivemq

Description

After upgrading from lychee v0.21.0 to v0.23.0 (via lychee-action v2.7.0 → v2.8.0), the "Total" count in the summary report jumped from ~3,800 to ~113,000 for the same documentation site (~489 HTML pages).

Root cause

In v0.21.0, collector.rs used a DashSet to deduplicate sources across files:

let seen = Arc::new(DashSet::new());
// ...
.filter_map({
    move |source: Result<String>| {
        let seen = Arc::clone(&seen);
        async move {
            if let Ok(s) = &source {
                if !seen.insert(s.clone()) {
                    return None; // Skip already-seen
                }
            }
            Some(source)
        }
    }
})

This was removed between v0.21.0 and v0.23.0. As a result, the "Total" count now reflects total link occurrences across all files rather than unique URLs to check.

For a documentation site with ~489 HTML pages sharing common navigation, footer links, and cross-references, this means the same URLs are counted hundreds of times.

Why this matters

  • The summary becomes misleading — users see 113K "Total" and may think lychee is doing 113K network requests (it's not, thanks to the in-memory cache).
  • It makes it harder to compare runs across versions.
  • It inflates other metrics like "Successful" count.

Suggestion

Consider showing both values in the summary:

| 🔍 Total (occurrences) | 113065 |
| 🔗 Unique URLs          | 3847   |

Or alternatively, restore deduplication for the "Total" count in the summary while still tracking per-file occurrences for error reporting.

Environment

  • lychee v0.21.0: Total = 3,847
  • lychee v0.23.0: Total = 113,065
  • Same site, same configuration, same number of pages

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions