feat: Allow scalars in horizontal expressions#3435
Conversation
narwhals/_arrow/namespace.py
Outdated
| pc.min_element_wise, [s.native for s in series], init_series.native | ||
| series = tuple(chain.from_iterable(expr(df) for expr in exprs)) | ||
| result = reduce( | ||
| lambda s1, s2: s1._with_binary(pc.min_element_wise, s2), series |
There was a problem hiding this comment.
Let _with_binary take care of the broadcasting for scalars
There was a problem hiding this comment.
pc.min_element_wise doesn't need broadcasting btw
There was a problem hiding this comment.
Given a scalar input, by the time we get here, it's a length one array, not a scalar anymore, and we get a shape mismatch
There was a problem hiding this comment.
Oh I see 🤦
I was thinking of this behavior:
(Scalar, Scalar) - > Scalar
(Scalar, Array) -> Array
(Array, Array) -> ArrayWhich is what broadcasting is, but I forgot the Scalar preservation is only in #2572
There was a problem hiding this comment.
0cad6c1 should reduce the overhead of both "align"-ing first, and then reduce-ing
| expr_results = [s for _expr in exprs for s in _expr(df)] | ||
| series = align_series_full_broadcast(df, *(s.fillna(0) for s in expr_results)) | ||
| non_na = align_series_full_broadcast( | ||
| df, *(1 - s.isna() for s in expr_results) | ||
| expr_results = align_series_full_broadcast( | ||
| df, *[s for _expr in exprs for s in _expr(df)] | ||
| ) | ||
| series = (s.fillna(0) for s in expr_results) | ||
| non_na = (1 - s.isna() for s in expr_results) |
There was a problem hiding this comment.
First broadcast (once) so that we ensure to have series (i.e. no more dask_expr.Scalar's), then it's possible to perform fillna() and isna() safely
| series = list(chain.from_iterable(expr(df) for expr in exprs)) | ||
| series = self._series._align_full_broadcast( | ||
| *chain.from_iterable(expr(df) for expr in exprs) | ||
| ) |
There was a problem hiding this comment.
Need to broadcast scalars first
There was a problem hiding this comment.
@dangotbanned unlike the arrow case, here we need to align the series for how we implement the operation after
|
|
||
| def sum_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr: | ||
| def sum_horizontal( | ||
| *exprs: PythonLiteral | IntoExpr | Iterable[PythonLiteral | IntoExpr], |
There was a problem hiding this comment.
I will comment here to exemplify the behavior. Polars definition of IntoExpr is already including python literal. They distinguish between:
# Inputs that can convert into a `col` expression
IntoExprColumn: TypeAlias = Union["Expr", "Series", str]
# Inputs that can convert into an expression
IntoExpr: TypeAlias = PythonLiteral | IntoExprColumn | NoneI am not sure if it's reasonable for us to eventually align with those definition and distinction
There was a problem hiding this comment.
I like this in polars too.
I think an issue in narwhals could be the special-casing for pandas non-str column "names".
But I'm not sure how far that support extends, e.g. do we actually support it everywhere?
See #3435 (comment) Also - tweaked some typing - added docs - included `coalesce` in the same path
Description
Some tiny extra gymnastic to allow this for pandas, pyarrow and dask. All other backends where solved by some refactor
What type of PR is this? (check all applicable)
Related issues
sum_horizontal#1868Checklist