refactor(skore): Make `pos_label` more consistent by cakedev0 · Pull Request #2663 · probabl-ai/skore

cakedev0 · 2026-03-25T09:07:54Z

Change description

Go from behavior table described in #2592 to this:

labels	pos_label	precision/recall	roc_auc	brier_score	roc/PR/conf. matrix
[0,1] / ["A","B"]	unset	all classes	scalar	scalar	all classes
[0,1] / ["A","B"]	fixed (1/"A")	pos_label=1/"A"	scalar	scalar	pos_label=1/"A"
[0,1,2]	unset	all classes	all classes	attribute error	all classes
[0,1,2]	fixed (2)	error at init	error at init	error at init	error at init

This PR also refactors how predictions are made to make those changes easier and too prepare the ground for moving pos_label from the init to the arguments of metrics/plots. Indeed, now cached predictions don't depend on the pos_label anymore (we adapt for pos_label on the fly if needed).
By doing so, it fixes #2671 (with the option 1 described in the issue)
And this refactor also fixes #2672

Contribution checklist

Unit tests were added or updated
Documentation was added or updated. TODO: check I updated everything that was needed.
TODO? A new changelog entry was added to CHANGELOG.rst

AI usage disclosure

For review and tests; not for the core changes, those were a bit too delicate* for AI.

*too delicate = not well-specified enough when I started working on it 😅

github-actions · 2026-03-25T09:20:10Z

Documentation preview @ 9d25578

github-actions · 2026-03-25T09:45:59Z

Coverage Report for skore/

File	Stmts	Miss	Cover	Missing
skore/src/skore
__init__.py	28	0	100%
_config.py	44	1	97%	57
exceptions.py	4	4	0%	4, 15, 19, 23
skore/src/skore/_project
__init__.py	0	0	100%
_summary.py	80	1	98%	121
_widget.py	195	0	100%
login.py	13	2	84%	65–66
plugin.py	12	0	100%
project.py	54	2	96%	131, 140
types.py	3	0	100%
skore/src/skore/_sklearn
__init__.py	8	0	100%
_base.py	38	0	100%
compare.py	5	0	100%
evaluate.py	27	1	96%	133
feature_names.py	26	0	100%
find_ml_task.py	61	0	100%
types.py	21	1	95%	29
skore/src/skore/_sklearn/_comparison
__init__.py	7	0	100%
inspection_accessor.py	26	1	96%	352
metrics_accessor.py	97	3	96%	214, 822, 989
report.py	123	7	94%	479, 486, 489, 495, 536–538
skore/src/skore/_sklearn/_cross_validation
__init__.py	9	0	100%
data_accessor.py	36	2	94%	48, 74
inspection_accessor.py	26	1	96%	324
metrics_accessor.py	93	2	97%	821, 965
report.py	122	6	95%	430, 433, 439, 504–506
skore/src/skore/_sklearn/_estimator
__init__.py	9	0	100%
data_accessor.py	48	1	97%	177
inspection_accessor.py	34	0	100%
metrics_accessor.py	262	2	99%	310, 1106
report.py	210	13	93%	209, 418, 452, 458, 509, 512, 531, 533, 539–540, 607–609
skore/src/skore/_sklearn/_plot
__init__.py	3	0	100%
base.py	61	2	96%	61–62
utils.py	149	7	95%	63, 65–66, 68, 274–275, 454
skore/src/skore/_sklearn/_plot/data
__init__.py	2	0	100%
table_report.py	177	1	99%	670
skore/src/skore/_sklearn/_plot/inspection
__init__.py	0	0	100%
coefficients.py	181	0	100%
impurity_decrease.py	103	2	98%	423, 467
permutation_importance.py	196	1	99%	583
utils.py	32	0	100%
skore/src/skore/_sklearn/_plot/metrics
__init__.py	6	0	100%
confusion_matrix.py	165	0	100%
metrics_summary_display.py	100	0	100%
precision_recall_curve.py	105	0	100%
prediction_error.py	166	0	100%
roc_curve.py	110	0	100%
skore/src/skore/_sklearn/train_test_split
__init__.py	2	0	100%
train_test_split.py	71	0	100%
skore/src/skore/_sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	19	1	94%	83
high_class_imbalance_warning.py	20	0	100%
random_state_unset_warning.py	10	0	100%
shuffle_true_warning.py	9	0	100%
stratify_is_set_warning.py	10	0	100%
time_based_column_warning.py	21	0	100%
train_test_split_warning.py	3	0	100%
skore/src/skore/_utils
__init__.py	6	2	66%	8, 13
_accessor.py	106	7	93%	36, 92–94, 164, 218, 238
_cache.py	37	0	100%
_cache_key.py	35	5	85%	22, 24, 51, 59, 68
_dataframe.py	37	1	97%	56
_environment.py	32	1	96%	44
_fixes.py	8	0	100%
_index.py	5	0	100%
_jupyter.py	8	2	75%	13–14
_logger.py	22	4	81%	15–17, 19
_measure_time.py	10	0	100%
_parallel.py	17	0	100%
_patch.py	21	12	42%	30, 35–39, 42–43, 46–47, 58, 60
_progress_bar.py	41	4	90%	55–56, 66–67
_show_versions.py	38	0	100%
_testing.py	112	11	90%	23, 32, 160, 169, 180–185, 187
skore/src/skore/_utils/repr
__init__.py	2	0	100%
base.py	54	0	100%
data.py	162	0	100%
html_repr.py	40	0	100%
rich_repr.py	81	0	100%
TOTAL	4324	113	97%

Tests	Skipped	Failures	Errors	Time
2046	5 💤	0 ❌	0 🔥	8m 16s ⏱️

cakedev0 · 2026-03-25T09:50:57Z

Tests for 8e22d89 pass, but when pos_label=None (in binary-classif) you get really bad plots 😅

cakedev0 · 2026-03-25T13:01:21Z

Looking good now (fix: e130809), but I had to rewrite mannnny tests 😅 (9ccb49d)

This reverts commit 1ecfc86.

GaetandeCast

Looks great, thanks @cakedev0! One suggestion to fix #2671. Also I did not follow the whole discussion but it looks like we don't infer the pos_label anymore?

GaetandeCast · 2026-03-26T10:40:29Z

skore/src/skore/_sklearn/_estimator/report.py

@@ -215,6 +235,7 @@ def clear_cache(self) -> None:
    def cache_predictions(
        self,
        response_methods: Literal["auto"] | str | list[str] = "auto",


One thing we could do to solve #2671 is to remove the response_methods here and cache all prediction methods in cache_prediction. This means we predict with either predict_proba or decision_function when available and deduce the predictions, or compute them with predict otherwise. This way we only predict once with the most informative function and can access any type of prediction in the cache later.

Yes, I 100% love this idea. This is a very valuable optimization I think (basically 2x/3x speed-up for predictions-dominated models).

Do you think we can decide to go with that? Or should we gather more opinions first?

I think we can go with it. I don't really see any drawbacks.

Hum... DummyClassifier(strategy="uniform") the argmax of predict proba is not predict (predictions are random, predict proba is 1/n_classes everywhere), this make a bunch of tests break.

More realistic examples are:

SVC(probability=True): but for this one, we can use the decision function

FixedThresholdClassifier and TunedThresholdClassifierCV (those probably break assumptions made in other places too).

I keep running on such small but assumption/abstraction-breaking edge cases those days 😭

Anyway, what do we do?

Ignore that and change the tests to avoid DummyClassifier(strategy="uniform"), but maybe this is on purpose that we have this in the tests.

Implement a special treatment for DummyClassifier (and maybe FixedThresholdClassifier, TunedThresholdClassifierCV), and use decision_function first when available (this fixes the SVC(probability=True) case)

Give up on this nice optimization (which also simplifies the code quite a lot honnestly)

I vote for 2 ^^ but I'd like to have @glemaitre opinion on that.

I think that we can go for the optimization. In short scikit-learn has common test to test this assumption, so most of the classifiers in scikit-learn would benefit for it. However, we cannot provide the same optimization for estimator outside of scikit-learn because we are not aware. In an ideal world, maybe the system of tags could help in this direction but we cannot bet that people implement it.

So in short, we can implement it in another PR where we will have a fallback for *ThresholdClassifier* and non-scikit-learn estimator.

Great. I opened a draft PR with the refactor/optim: #2677.

In this PR, I went with option 1 for #2671 (i.e. record only the time for predict, and not for other response methods).

cakedev0 · 2026-03-26T11:03:25Z

we don't infer the pos_label anymore?

Indeed. Inferring it only for the case {0, 1} and {-1, 1} and crashing otherwise is not a great behavior. Instead we choose to not crash and display/return for both labels.

cakedev0 added 3 commits March 24, 2026 21:47

Refactor predictions to be more flexible for pos_label - Tests pass

5980c61

fix mypy

318c1db

remove _get_cached_repsonse test

646bcad

cakedev0 marked this pull request as draft March 25, 2026 09:08

cakedev0 changed the title ~~refactor(skore): Make pos_ label more consistent.~~ refactor(skore): Make pos_ label more consistent Mar 25, 2026

cakedev0 added 2 commits March 25, 2026 10:34

fix for sklearn<1.7

6e2aabc

update comment

8e22d89

cakedev0 added 6 commits March 25, 2026 10:52

attempt at fixing sphinx build

31547d7

Merge branch 'main' into pos_label_consistency

cda3d52

fix ROC/PR plots

e130809

fix ROC/PR plots for decision_function

c6fbe12

adpat doc & tests

9ccb49d

temp. disable sphinx parral.

1ecfc86

glemaitre changed the title ~~refactor(skore): Make pos_ label more consistent~~ refactor(skore): Make pos_label more consistent Mar 25, 2026

cakedev0 added 4 commits March 25, 2026 17:09

Normalize & check decision_function shape; clean-up ROC/PR code

eb4cb4b

Merge branch 'main' into pos_label_consistency

c877a61

Revert "temp. disable sphinx parral."

955d9ea

This reverts commit 1ecfc86.

fix: hstask -> vstack

7bd0a36

cakedev0 marked this pull request as ready for review March 26, 2026 09:16

Merge branch 'main' into pos_label_consistency

38e28e4

GaetandeCast reviewed Mar 26, 2026

View reviewed changes

store just the predict_time in cache

47b8142

cakedev0 mentioned this pull request Mar 27, 2026

perf(skore): Compute predictions from decision_function/predict_proba when possible #2677

Draft

8 tasks

cakedev0 added 2 commits March 27, 2026 11:30

Merge branch 'main' into pos_label_consistency

7730d63

fixing some gaps

a85c090

cakedev0 added 2 commits March 27, 2026 13:54

fix some more gaps

2d43a20

test cm(ds=both) raises

9d25578

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(skore): Make `pos_label` more consistent#2663

refactor(skore): Make `pos_label` more consistent#2663
cakedev0 wants to merge 21 commits intomainfrom
pos_label_consistency

cakedev0 commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

cakedev0 commented Mar 25, 2026

Uh oh!

cakedev0 commented Mar 25, 2026

Uh oh!

GaetandeCast left a comment

Uh oh!

GaetandeCast Mar 26, 2026

Uh oh!

cakedev0 Mar 26, 2026

Uh oh!

GaetandeCast Mar 26, 2026

Uh oh!

cakedev0 Mar 26, 2026

Uh oh!

glemaitre Mar 26, 2026

Uh oh!

cakedev0 Mar 27, 2026

Uh oh!

cakedev0 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cakedev0 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change description

Contribution checklist

AI usage disclosure

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cakedev0 commented Mar 25, 2026

Uh oh!

cakedev0 commented Mar 25, 2026

Uh oh!

GaetandeCast left a comment

Choose a reason for hiding this comment

Uh oh!

GaetandeCast Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

cakedev0 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

GaetandeCast Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

cakedev0 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

glemaitre Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

cakedev0 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

cakedev0 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cakedev0 commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading