Update buscar's pipelines using the Lazy Dataframe object

Would it make sense to try a lazy interpretation of this?
```suggestion
        scores_df.lazy().group_by("treatment")
        .agg(
            (
                (pl.col("on_score") * pl.col("ratio")).sum()
                + (pl.col("off_score") * pl.col("ratio")).sum()
            ).alias("compound_score")
        )
        .sort("compound_score").collect()
    )
```

_Originally posted by @d33bs in https://github.com/WayScience/buscar/pull/42#discussion_r2394786389_
          
Eager DataFrame pipelines in Polars can consume large amounts of memory by materializing intermediate results at each step, especially when chaining transformations such as sorts, joins, and groupbys. This often leads to memory pressure and reduced performance in multi-step data-wrangling code. To address this, the lazy API should be used to let Polars optimize and fuse operations, push down filters, and avoid unnecessary materialization, thereby improving both speed and memory efficiency. For small, one-off operations, eager execution remains appropriate, but for large datasets or multi-step pipelines, a practical pattern is to have functions accept and return LazyFrame objects, invoking `.collect()` only at the pipeline boundary to maximize optimization and compose complex workflows more effectively.

Really good explanation of LazyDataFrames in polars:
https://stackoverflow.com/a/76612637

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update buscar's pipelines using the Lazy Dataframe object #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update buscar's pipelines using the Lazy Dataframe object #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions