feat: Use _repr_html_ when native supports it#2776
feat: Use _repr_html_ when native supports it#2776dangotbanned wants to merge 21 commits intomainfrom
_repr_html_ when native supports it#2776Conversation
- Related #1702 - https://ipython.readthedocs.io/en/stable/config/integrating.html#rich-display - https://github.com/pandas-dev/pandas/blob/22f12fc5d3f7fda3f198760204e7c13150c78581/pandas/core/frame.py#L1189-L1232 - https://github.com/pola-rs/polars/blob/8011fa34e0c5f1270ef52e2d3b0b2946bb2faa72/py-polars/polars/dataframe/frame.py#L1580-L1605
| style_css = ( | ||
| ".dataframe caption { " | ||
| "caption-side: bottom; " | ||
| "text-align: center; " | ||
| "font-weight: bold; " | ||
| "padding-top: 8px;" | ||
| "}" | ||
| ) |
There was a problem hiding this comment.
If anyone has any suggestions for styling - feel free to experiment/comment 🙂
The only decision I'd made so far was putting the <caption> below the table
With the default polars formatting, it appeared between the table and the shape tuple when above - which I thought looked odd
- `pandas` reuses the eager version - `pyarrow` doesn't support - `ibis` requires changing global config, so skipping that - `dask` does have a `_repr_html_`, but doesn't parse well
narwhals/_utils.py
Outdated
| if header == "Narwhals LazyFrame" and "LazyFrame" in native_html: | ||
| html = native_html.replace("LazyFrame", "LazyFrame.to_native()") | ||
| return f"{html}<p><b>{header}</b></p>" |
There was a problem hiding this comment.
Had to add this branch for pl.LazyFrame as it wasn't parsing with my naive wrapper:
import io
import xml.etree.ElementTree as ET
import polars as pl
data = {"a": [1, 2, 3], "b": ["fdaf", "fda", "cf"]}
ldf = pl.LazyFrame(data)
>>> ET.parse(io.StringIO(ldf._repr_html_()))
ParseError: junk after document element: line 1, column 25Seems to fail on the first <p> in https://github.com/pola-rs/polars/blob/dfa5efe71156c654a1ba3a54b865eae723a818e9/py-polars/polars/lazyframe/frame.py#L783
- `pandas` only supports it for `pd.DataFrame`
Possible follow-upsJust some loose ideas, nothing I'm planning to work on any time soon 😅
|
|
Thought I'd do one last check before closing this one, here's a few options to choose from:
No worries if we don't want it 🙂 |
FBruzzesi
left a comment
There was a problem hiding this comment.
@dangotbanned thank to your ping - I got reminded that I once started to look at this, and never finished. I am not against having a good support, yet I am not very useful nor opinionated about this.
I would try to aim for a pareto optimum that balances usefulness and maintainability 😂
narwhals/_utils.py
Outdated
| header: Literal["Narwhals DataFrame", "Narwhals LazyFrame", "Narwhals Series"], | ||
| native_html: str, | ||
| ) -> str | None: # pragma: no cover | ||
| if header == "Narwhals LazyFrame" and "LazyFrame" in native_html: | ||
| html = native_html.replace("LazyFrame", "LazyFrame.to_native()") | ||
| return f"{html}<p><b>{header}</b></p>" |
There was a problem hiding this comment.
I am mostly nitpicking here but... isn't the header actually a footer? 😂
There was a problem hiding this comment.
You're quite right 😂
It started as a header until I ran into (#2776 (comment))
I should've updated that to footer or caption
| tree.getroot().insert(0, style) | ||
| buf = io.BytesIO() | ||
| tree.write(buf, "utf-8", method="html") | ||
| return buf.getvalue().decode() |
There was a problem hiding this comment.
Everything else in this function is a new language to me - I am not very helpful
There was a problem hiding this comment.
Ah yeah xml.etree.elementtree is a bit of a strange one
I had to learn a bit of lxml once to fix a particularly broken file.
The API of that is based on this stdlib module, but was more ergonoic than this mess 😄
To simplify this:
- Element: Is a HTML Element
- Tree: Refers to a document/webpage, but in this case it is just a table
So I'm essentially doing a fancy find/replace, but trying to preserve the structure of the document
| style_css = ( | ||
| ".dataframe caption { " | ||
| "caption-side: bottom; " | ||
| "text-align: center; " | ||
| "font-weight: bold; " | ||
| "padding-top: 8px;" | ||
| "}" | ||
| ) |
I've started #2925 and came up against the import pyarrow as pa
import narwhals as nw
>>> nw.Series.from_iterable("a", [4, 1, 3, 2], dtype=nw.UInt32, backend=pa)
┌───────────────────────────────────────────────────────┐
| Narwhals Series |
|-------------------------------------------------------|
|<pyarrow.lib.ChunkedArray object at 0x0000017129497880>|
|[ |
| [ |
| 4, |
| 1, |
| 3, |
| 2 |
| ] |
|] |
└───────────────────────────────────────────────────────┘Even if we don't go ahead with The >>> nw.Series.from_iterable("a", [4, 1, 3, 2], dtype=nw.UInt32, backend="polars")
┌─────────────────┐
| Narwhals Series |
|-----------------|
|shape: (4,) |
|Series: 'a' [u32]|
|[ |
| 4 |
| 1 |
| 3 |
| 2 |
|] |
└─────────────────┘ |
|
If we just wanted shape: (365,)
dtype: Datetime(time_unit='us', time_zone=None)
name: 'time series'
nw.Series[pyarrow]
[
2009-01-02 00:00:00
2009-01-03 00:00:00
2009-01-04 00:00:00
2009-01-05 00:00:00
2009-01-06 00:00:00
…
2009-12-28 00:00:00
2009-12-29 00:00:00
2009-12-30 00:00:00
2009-12-31 00:00:00
2010-01-01 00:00:00
]shape: (30,)
dtype: UInt32
name: 'lower max rows'
nw.Series[pyarrow]
[
0
1
2
…
27
28
29
]shape: (30,)
dtype: Int16
name: 'oh pandas too???'
nw.Series[pandas]
[
29
28
27
26
25
24
…
5
4
3
2
1
0
]Would be nicer-er if we used the short type codes from |



What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
I discovered this method in (#2572), when I was trying to work out why
polars.Exprlooked so much better that what I had 😅Thinking we can get more immediate benefits now by allowing this option when a backend supports it for:
DataFramepandas,polars)LazyFramepandas,polars)Series,pandaspolars)