Skip to content

feat(skore): Support skrub dataops in reports#2635

Draft
jeromedockes wants to merge 20 commits intoprobabl-ai:mainfrom
jeromedockes:skrub-dataop-support
Draft

feat(skore): Support skrub dataops in reports#2635
jeromedockes wants to merge 20 commits intoprobabl-ai:mainfrom
jeromedockes:skrub-dataop-support

Conversation

@jeromedockes
Copy link
Copy Markdown
Collaborator

closes #2133

Still a rough draft. The different reports can now be initialized with a SkrubLearner rather than a regular scikit-learn estimator. This could allow:

  • enabling users to benefit from both skrub and skore
  • capturing data wrangling / preprocessing operations that would otherwise be invisible to skore
  • perhaps exploiting skrub's built-in previews and subsampling, and maybe one day caching
  • maybe opportunities for reporting and diagnoses through inspection of the computation graph and intermediate results
  • reporting on hyperparameter search as skrub allows easily specifying hyperparameter ranges (or other kinds of tunable choices) and can perform search either backed by scikit-learn or optuna

@glemaitre
Copy link
Copy Markdown
Member

I'm pretty much fine with what is going on. I'm wondering if we could a sort of abstraction to validate both the data and the estimator at the __init__. But it might be only that things needs to be in 2 dedicated functions and then it makes that the rest is clearer.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

Documentation preview @ 81e33ef

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

Coverage

Coverage Report for skore/
FileStmtsMissCoverMissing
skore/src/skore
   __init__.py280100% 
   _config.py44197%57
   exceptions.py440%4, 15, 19, 23
skore/src/skore/_project
   __init__.py00100% 
   _summary.py80198%121
   _widget.py1950100% 
   login.py13284%65–66
   plugin.py120100% 
   project.py54296%131, 140
   types.py30100% 
skore/src/skore/_sklearn
   __init__.py80100% 
   _base.py540100% 
   compare.py50100% 
   evaluate.py27196%133
   feature_names.py260100% 
   find_ml_task.py610100% 
   types.py21195%29
skore/src/skore/_sklearn/_comparison
   __init__.py70100% 
   inspection_accessor.py26196%352
   metrics_accessor.py97396%214, 822, 989
   report.py123794%479, 486, 489, 495, 536–538
skore/src/skore/_sklearn/_cross_validation
   __init__.py90100% 
   data_accessor.py36294%48, 74
   inspection_accessor.py26196%324
   metrics_accessor.py93297%821, 965
   report.py1542186%71–73, 77–79, 83, 194, 222, 433, 461, 464–465, 469, 490, 509, 512, 518, 583–585
skore/src/skore/_sklearn/_estimator
   __init__.py90100% 
   data_accessor.py48197%177
   inspection_accessor.py340100% 
   metrics_accessor.py260299%310, 1105
   report.py2102588%44–46, 51, 54, 60, 207–211, 397, 429, 432–433, 477, 484, 487, 506, 508, 514–515, 582–584
skore/src/skore/_sklearn/_plot
   __init__.py30100% 
   base.py61296%61–62
   utils.py142397%273–274, 453
skore/src/skore/_sklearn/_plot/data
   __init__.py20100% 
   table_report.py177199%670
skore/src/skore/_sklearn/_plot/inspection
   __init__.py00100% 
   coefficients.py1810100% 
   impurity_decrease.py103298%423, 467
   permutation_importance.py196199%583
   utils.py320100% 
skore/src/skore/_sklearn/_plot/metrics
   __init__.py60100% 
   confusion_matrix.py1650100% 
   metrics_summary_display.py1000100% 
   precision_recall_curve.py1080100% 
   prediction_error.py1660100% 
   roc_curve.py1130100% 
skore/src/skore/_sklearn/train_test_split
   __init__.py20100% 
   train_test_split.py710100% 
skore/src/skore/_sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py19194%83
   high_class_imbalance_warning.py200100% 
   random_state_unset_warning.py100100% 
   shuffle_true_warning.py90100% 
   stratify_is_set_warning.py100100% 
   time_based_column_warning.py210100% 
   train_test_split_warning.py30100% 
skore/src/skore/_utils
   __init__.py6266%8, 13
   _accessor.py106793%36, 92–94, 164, 218, 238
   _cache.py370100% 
   _cache_key.py35585%22, 24, 51, 59, 68
   _dataframe.py37197%56
   _environment.py32293%41, 44
   _fixes.py80100% 
   _index.py50100% 
   _jupyter.py8275%13–14
   _logger.py22481%15–17, 19
   _measure_time.py100100% 
   _parallel.py170100% 
   _patch.py211242%30, 35–39, 42–43, 46–47, 58, 60
   _progress_bar.py41490%55–56, 66–67
   _show_versions.py380100% 
   _skrub.py37294%19, 74
   _testing.py1251389%24, 33, 71–72, 189, 198, 209–214, 216
skore/src/skore/_utils/repr
   __init__.py20100% 
   base.py540100% 
   data.py1620100% 
   html_repr.py400100% 
   rich_repr.py810100% 
TOTAL441914196% 

Tests Skipped Failures Errors Time
2058 5 💤 0 ❌ 0 🔥 7m 51s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(skore): Integration with skrubs.DataOp plan

2 participants