Skip to content

Refactor: Make generate_final_output() data-drivenΒ #146

@cmkobel

Description

@cmkobel

Summary

generate_final_output() in workflow/Snakefile (lines ~323–388) uses three similar expand() blocks with conditional logic based on genome count N:

  • Nβ‰₯1: singular analyses
  • Nβ‰₯2: pairwise comparisons
  • Nβ‰₯3: phylogenetics

Each block manually lists rules and their expand patterns. Adding a new rule requires finding the right block and duplicating the expand boilerplate.

Proposed approach

Use a data-driven approach β€” a list of (rule_output_pattern, min_N) tuples β€” and a single loop that expands all rules where N >= min_N:

_analyses = [
    ("results/{sample}/annotation/...", 1),
    ("results/batch/panaroo/...", 2),
    ("results/batch/fasttree/...", 3),
]

def generate_final_output(N):
    return [expand(pattern, ...) for pattern, min_n in _analyses if N >= min_n]

This makes it trivial to add/remove analyses and see the N-threshold for each at a glance.

Files to touch

  • workflow/Snakefile (lines ~323–388)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions