Skip to content

Parse Verkko-style TSV files of paths#408

Merged
fedarko merged 36 commits intomainfrom
gaf
Apr 16, 2026
Merged

Parse Verkko-style TSV files of paths#408
fedarko merged 36 commits intomainfrom
gaf

Conversation

@fedarko
Copy link
Copy Markdown
Member

@fedarko fedarko commented Mar 27, 2026

Closes #336 and closes #357. This has the nice side effect of allowing multi-component paths (across all path filetypes, not just Verkko TSVs).

So, this is actually pretty close to being ready. HOWEVER! It turns
out that there are quite a lot of paths in these files that can span
multiple connected components of the graph, which is crazy to me.
My code expects a path to be constrained to only a single connected
component, because ... it's a path

Anyway, I can definitely adjust things to allow a path to map to
multiple ccs -- this would just mean that we would not consider a
path as "available" unless all of the ccs that it spans are currently
drawn. Definitely feasible.

This would take some time, though. And in the interest of getting
a release out soon AND in actually testing this functionality, I
think it is best to consign this feature to another-git-branch jail
until I get it sorted out.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 78.89908% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.73%. Comparing base (88faf40) to head (abecf27).
⚠️ Report is 38 commits behind head on main.

Files with missing lines Patch % Lines
metagenomescope/graph/assembly_graph.py 37.50% 14 Missing and 1 partial ⚠️
metagenomescope/descs.py 0.00% 2 Missing ⚠️
metagenomescope/main.py 0.00% 2 Missing ⚠️
metagenomescope/path_utils.py 97.36% 1 Missing and 1 partial ⚠️
metagenomescope/_cli.py 0.00% 1 Missing ⚠️
metagenomescope/parsers.py 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
+ Coverage   65.15%   65.73%   +0.58%     
==========================================
  Files          33       33              
  Lines        3980     4039      +59     
  Branches      975      989      +14     
==========================================
+ Hits         2593     2655      +62     
- Misses       1311     1312       +1     
+ Partials       76       72       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@fedarko fedarko changed the title Implement tentative GAF parser Parse Verkko-style TSV files of paths Apr 11, 2026
fedarko added 8 commits April 15, 2026 20:32
this way, changes to the internal implementation (to allow for
multi-cc paths) have a limited impact
and just like that, verkko paths are pretty much sorted!

Something to think about: should we address #357?
There are some paths left out when we draw all NR ccs (#67).

If we duplicate each path like we do for GFA paths, then this
should fix the problem for single-component paths. But what about
paths that traverse multiple components? Ugh, things get complicated.
Especially if there is jank like a path that traverses BOTH a
component and its twin, in which case we can never represent this
path when drawing all NR ccs!!! oh no!!!!

The trouble with duplicating paths is that - at least from my
perspective - it gets you thinking about "okay, 20 / 98 paths are
available, how many of those missing 78 paths are 'real ones'?"
Like, the user doesn't care about missing paths if they are just
perfect RCs of the currently available paths. And how do you
communicate that through just a number...?

After writing all of this out, I think a better solution to #357 is
just NOT duplicating paths at all - even for GFA files. This way,
if the user draws some components and sees that some paths are not
available, then it is clear that all the missing paths have some
inherent meaning (and are not perfect RCs of the available paths).
Some paths may be missing even when drawing NR components, in which
case the user can fix the problem by drawing all components. I think
this is very reasonable?

We could even maybe go crazy and try to set which component is the
drawn one for NR based on how many paths cross it, but... that seems
not worth it. Just keep things simple................
Closes #357. See message of commit e6a483a for discussion.
@fedarko fedarko merged commit 609f3af into main Apr 16, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize treatment of reverse-complementary paths Support parsing paths in Verkko TSV format

1 participant