chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]#3266
chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]#3266
IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]#3266Conversation
IntoSchema typing, allow Sequence[tuple[str, IntoDType]IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]]
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
🥳
But now to ruin your day slightly ...
We got our typing for Schema.__init__ from polars, but even this is too strict.
Here's what typeshed has for dict, which we support (except **kwargs)
Luckily we don't need to use any overloads, but we should have this in IntoSchema somewhere:
Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]There was a problem hiding this comment.
Interesting timing though, I just pushed a test where I passed a non-Mapping object to nw.Schema 😂
narwhals/tests/plan/schema_test.py
Lines 18 to 20 in 1579b32
There was a problem hiding this comment.
Thanks @dangotbanned
Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
SupportsKeysAndGetItemwhat the hell is this 😂 how's Mapping not enough anymore?- Regarding
Iterable, I always have a hard time being comfortable with it, since it can be infinite (and lazy).- How does infinite converts to a dict?
- Lazy makes it a bit more of a headache when we want to check it's content (e.g. first element is a string or a tuple), but that's alright
- This is the reason why I went with
Sequence, although polarsSchema.__init__signature is with theIterable
There was a problem hiding this comment.
We got our typing for Schema.init from polars, but even this is too strict.
can we resolve that later and just get the type[Dtype] / Dtype part sorted out in this one?
There was a problem hiding this comment.
SupportsKeysAndGetItemwhat the hell is this 😂 how's Mapping not enough anymore?
Lol, it is an object that supports those methods ofc 😉
just import it in a TYPE_CHECKING block from here
from _typeshed import SupportsKeysAndGetItemThere was a problem hiding this comment.
Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite
If someone passes an infinite stream to an initializer - they need to learn the lesson that the call will never complete
(and lazy).
This is a desirable property for the caller and I think if we want to validate things, then that is our responsibility
Reducing scope
How about aligning just Schema.__init__ with dict.__init__?
We are only checking for None there, so the issues you have seem to not apply there
There was a problem hiding this comment.
@dangotbanned attempt in 9ce61fe, which now fails for mypy when the dict has mix of initialized and un-initialized dtypes, which was the initial purpose of the PR 🤔
Edit: By changing Iterable into Sequence there are no issue (see 0e0d7a8). I am personally not interested in diving deeper than this.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block
There was a problem hiding this comment.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a
if TYPE_CHEKING:block
tubular uses a runtime type checker (beartype).
I ran into a similar problem the last time I changed the definition of Schema:
There was a problem hiding this comment.
Regarding (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483418646), (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483400506)
This is my bad, but we should keep that part and my original TODO as a todo
Lines 307 to 308 in d7bb050
These things are related, but the timing of #3252 complicated both this PR and my TODO 😞
There was a problem hiding this comment.
Randomly removing some initialization to test the typechecking
narwhals/typing.py
Outdated
| IntoNullableSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType | None] | Sequence[tuple[str, IntoDType | None]]" | ||
| ) | ||
| """Schema specification with possible None values.""" |
There was a problem hiding this comment.
TODO: Please let's come up with a better description
| return (value,) * n_match if isinstance(value, bool) else tuple(value) | ||
|
|
||
|
|
||
| class NullableSchema(OrderedDict[str, "IntoDType | None"]): |
There was a problem hiding this comment.
Reason for this class is mostly two folded:
- Use as a utility to convert a
Sequence[tuple[str, DType {|None}]]into a mapping. Hence make it easy to use the same API (key, dtype in obj.items()). - Easily set a flag to know if any value passed is None
There was a problem hiding this comment.
The name Nullable* is making me think this is related to #3176 (comment) again 🫣
I'm not opposed to having classes to make our internal API cleaner btw 👍
There was a problem hiding this comment.
The name Nullable* is making me think this is related to #3176 (comment) again 🫣
Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.
I'm not opposed to having classes to make our internal API cleaner btw 👍
Any preference in having it prefixed by _?
There was a problem hiding this comment.
Any preference in having it prefixed by
_?
It is already in _utils, so no need for a prefix 🙂
The current _ names are an artifact from when we had everything in utils.py
There was a problem hiding this comment.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.
I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.
Linking to (#3176 (comment)) was supposed to show that I made this mistake already 😂
There was a problem hiding this comment.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.I'll try to give a more complete explanation later, but I was hinting at
Nullable,Null,Nonebeing a bit overloaded.
I guess now still counts as later 😂
Somebody wrote docs about the term nullable, and this usage is not to do with that
|
Regarding CI failures: the only one related to this PR seem to be tubular 👀 |
IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]]IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
narwhals/typing.py
Outdated
| "IntoFrameT", | ||
| "IntoLazyFrame", | ||
| "IntoLazyFrameT", | ||
| "IntoNullableSchema", |
There was a problem hiding this comment.
Are we sure about making this a public type?
Is it possible to only focus on Dtype -> intodtype changes here without introducing new public types?
There was a problem hiding this comment.
Ok I honestly think there might be something wrong in the PR to begin with. If I refocus on the basic changes to fix #3257, then passing a dictionary with mixed initialized and un-initialized dtypes does still complain (pyright is ok with it, mypy is not)
|
I will keep this PR open for a bit longer - if anyone is interested to bringing to the finish line, I am happy to hand it over 🙈 |
There was a problem hiding this comment.
Thanks for the PR (and reminding me about it 🙏) @FBruzzesi
I think there are parts of this that are ready to go!
But the combination of (#3262 and #3252) + my suggestion took the PR in a direction that you couldn't have anticipated when opening #3262
It seems like I made a suggestion on that PR that may have been lost (#3262 (comment)):
@FBruzzesi I think if you were looking to clean things up following #3252, then #2486 may be what we're missing?
This issue is still open and we've both mentioned it recently, maybe there's something to that?
| return Version.MAIN.dataframe.from_dict(data, schema, backend=backend) | ||
|
|
There was a problem hiding this comment.
I like this change and think we can split those like it into a PR that makes a dent into #2786
| f" nw.from_numpy(arr, backend='pyarrow').lazy('{implementation}')" | ||
| ) | ||
| raise ValueError(msg) | ||
| return Version.MAIN.dataframe.from_numpy(data, schema, backend=backend) |
There was a problem hiding this comment.
(Same comment as from_dict)
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a
if TYPE_CHEKING:block
tubular uses a runtime type checker (beartype).
I ran into a similar problem the last time I changed the definition of Schema:
| return list(self.keys()) | ||
|
|
||
| def dtypes(self) -> list[DType]: | ||
| def dtypes(self) -> list[IntoDType]: |
There was a problem hiding this comment.
I think this may be difficult to work with.
list[DType] is invariant but names a single concrete type.
OTOH, list[IntoDType] will expand to this:
list[DType | type[NumericType | TemporalType | String | Boolean | Binary | Categorical | Unknown | Object]And the problem there is that type would be rejected by another annotation which used list[DType] 😢
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
Regarding (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483418646), (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483400506)
This is my bad, but we should keep that part and my original TODO as a todo
Lines 307 to 308 in d7bb050
These things are related, but the timing of #3252 complicated both this PR and my TODO 😞
| return (value,) * n_match if isinstance(value, bool) else tuple(value) | ||
|
|
||
|
|
||
| class NullableSchema(OrderedDict[str, "IntoDType | None"]): |
There was a problem hiding this comment.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.I'll try to give a more complete explanation later, but I was hinting at
Nullable,Null,Nonebeing a bit overloaded.
I guess now still counts as later 😂
Somebody wrote docs about the term nullable, and this usage is not to do with that
What type of PR is this? (check all applicable)
Related issues
typing.IntoSchematoo narrow #3257Checklist
If you have comments or can explain your changes, please do so below