Skip to content

chore: Add DType.__call__#3393

Draft
dangotbanned wants to merge 2 commits intomainfrom
dtypes/dtype-__call__
Draft

chore: Add DType.__call__#3393
dangotbanned wants to merge 2 commits intomainfrom
dtypes/dtype-__call__

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Jan 7, 2026

Description

As part of #3386, @FBruzzesi and I identified an issue with simply reusing IntoDType.__eq__ to compare DTypes.

When we start nesting DTypes, there are cases like this where a change from a default makes the Schema incompatible:

import datetime as dt

import polars as pl

data = {
    "a": [1, 2, 3],
    "b": ["hello", None, "its me again"],
    "c": [
        dt.datetime(2000, 1, 1, 12, 0, 0),
        dt.datetime(2000, 1, 1, 13, 0, 0),
        dt.datetime(2000, 2, 1, 9, 0, 0),
    ],
}

base = pl.DataFrame(data, schema_overrides={"c": pl.Datetime("us")})  # the default, but for visibility
collect_struct = pl.struct(pl.all()).alias("struct")
base_struct = base.select(collect_struct)
c = pl.col("c")

diff_time_unit = base.with_columns(c.dt.cast_time_unit("ms")).select(collect_struct)     # ✔️ 
diff_time_zone = base.with_columns(c.dt.convert_time_zone("UTC")).select(collect_struct) # ❌

Here, different time_units are allowed and we should just pick the less-precise one:

pl.concat([base_struct, diff_time_unit], how="vertical_relaxed").schema
Schema([('struct',
         Struct({'a': Int64, 'b': String, 'c': Datetime(time_unit='ms', time_zone=None)}))])

But if the difference is on time_zones - we should raise:

pl.concat([base_struct, diff_time_zone], how="vertical_relaxed")
SchemaError: failed to determine supertype of struct[3] and struct[3]

This commit shows how many isinstance checks and imports that adding DType.__call__ saves.
The cost might seem small, but it adds up once you consider concat need to do it for:

  • a pairwise comparison of many Schemas
  • each field may contain any number of nested DTypes
  • those DTypes can also be nested 😳

Caching will help, but since this problem can scale in multiple directions - it's probably best we minimize anything we can 😄

What type of PR is this? (check all applicable)

  • 🔧 Optimization
  • 🐳 Other

Related issues

Needed to see if this broke downstream ci, but all unrelated:

Need to see if this also passes downstream ci

Related (bdbb1c5)
Comment on lines +178 to +204
def __call__(self) -> Self:
# NOTE: (Internal doc)
# But, why?
#
# Let's say we have a function that looks like this:
#
# >>> import pyarrow as pa
# >>> from narwhals.dtypes import DType
# >>> from narwhals.typing import NonNestedDType
#
# >>> def convert(dtype: DType | type[NonNestedDType]) -> pa.DataType: ...
#
#
# If we need to resolve the union to a *class*, we can use:
#
# >>> always_class = dtype.base_type()
#
# But what if instead, we need an *instance*?:
#
# >>> always_instance_1 = dtype if isinstance(dtype, DType) else dtype()
# >>> always_instance_2 = dtype() if isinstance(dtype, type) else dtype
#
# By defining `__call__`, we can save an instance check and keep things readable:
#
# >>> always_instance = dtype()
# >>> always_class = dtype.base_type()
return self
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intentionally not making this public API, but want there to be a reminder on how it helps 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree to avoid making this public 👌🏼

@dangotbanned dangotbanned marked this pull request as ready for review January 7, 2026 17:34
@dangotbanned dangotbanned changed the title chore(ci-experiment): Add DType.__call__ chore: Add DType.__call__ Jan 7, 2026
Comment on lines +595 to +596
def _resolve_base_type_dtype(into: IntoDType, /) -> tuple[type[DType], DType]:
return into.base_type(), into()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is a TL;DR

We can extract the class and instance from IntoDType - only paying the cost of initialization if we needed to anyway

@camriddell
Copy link
Member

I'm not a huge fan of relying on __call__ for this behavior because it becomes very hard to tell whether you are instantiating an object or produce a no-op by reading the code alone. Instead, we could use a descriptor attached to a semantically meaningful name that creates a bit more clarity as to what the intent of the code is at the call site.

from __future__ import annotations
from typing import ClassVar

class Instance[T]:
    def __get__(self, instance: T | None, owner: type[T]) -> T:
        if instance is None:
            return owner()
        return instance

class Dtype:
    instance: ClassVar[Instance[Dtype]] = Instance()


print(Dtype().instance) # <__main__.Dtype object at 0x7512677f5940>
print(Dtype.instance)   # <__main__.Dtype object at 0x751267771450>

@MarcoGorelli
Copy link
Member

thanks @camriddell - yeah i think i prefer that too

@dangotbanned dangotbanned marked this pull request as draft January 9, 2026 17:46
@dangotbanned
Copy link
Member Author

#3393 (comment)

Thanks for the input @camriddell! 🙂

I'm having trouble writing up something concise.
For now I just wanna zoom into what the two look like side-by-side, in the context that I'm using DType.__call__:

def list_supertype_current(left: List, right: List, version: Version) -> List | None:
    if inner := get_supertype(left.inner(), right.inner(), version=version):
        return List(inner)
    return None


def list_supertype_proposed(left: List, right: List, version: Version) -> List | None:
    if inner := get_supertype(left.inner.instance, right.inner.instance, version=version):
        return List(inner)
    return None

in relation to ...

I'm not a huge fan of relying on __call__ for this behavior
because it becomes very hard to tell whether you are instantiating an object or produce a no-op by reading the code alone.

I think I understand your concern, but this swings too far in the opposite direction IMO.

Let's pretend we don't know what the descriptor protocol is for a moment.
Which version looks like a call to constructor?

left.inner()
left.inner.instance

From reading the code, the second looks like an attribute access and is hiding that it could be constructing a DType.

If we could only highlight doing work vs not doing work - surely a silent no-op is preferable to the inverse, right?

@FBruzzesi
Copy link
Member

FBruzzesi commented Jan 9, 2026

Let me chime in the discussion as well as I kind of started both:

  • The check for left_inner = left.inner if isinstance(left.inner, DType) else left.inner()
  • The discussion with @camriddell to find a different solution than implementing __call__

Theory

In theory, nested types allow for the inner type to be any IntoDType

narwhals/narwhals/typing.py

Lines 278 to 279 in d7bb050

IntoDType: TypeAlias = "dtypes.DType | type[NonNestedDType]"
"""Anything that can be converted into a Narwhals DType.

which means that also classes rather than object are allowed.

This aligns with what polars does - I assume to improve ergonomics for users - for example in Expr.cast it's possible to pass {pl, nw}.Int8 or {pl, nw}.List(inner={pl, nw}.Int8).

Practice

In practice, if our only use case for get_supertype is to promote datatypes in concat(..., how={"vertical_relaxed", "diagonal_relaxed"}) or more generally for internal use only, then it should never be the case that the inner type of a nested type is not initialized. In fact all the native_to_narwhals_dtype(dtype: NativeDType, ...) -> DType return an instance and never the class, and that's what we use in collect_schema methods.

In other words, the only case in which we can get a class rather than an instance is when a user passes it.

In conclusion

I mostly wanted to state the obvious, trying to bring the attention to the fact that we are solution-ing/debating for a non-problem as we may not need to do the inner() call in the first place. The main reason to have it is to be type safe/make the type checker happy

@dangotbanned dangotbanned mentioned this pull request Jan 10, 2026
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants