refactor(skore): Make pos_label more consistent#2663
Conversation
pos_ label more consistent.pos_ label more consistent
|
Tests for 8e22d89 pass, but when
|
pos_ label more consistentpos_label more consistent
| @@ -215,6 +235,7 @@ def clear_cache(self) -> None: | |||
| def cache_predictions( | |||
| self, | |||
| response_methods: Literal["auto"] | str | list[str] = "auto", | |||
There was a problem hiding this comment.
One thing we could do to solve #2671 is to remove the response_methods here and cache all prediction methods in cache_prediction. This means we predict with either predict_proba or decision_function when available and deduce the predictions, or compute them with predict otherwise. This way we only predict once with the most informative function and can access any type of prediction in the cache later.
There was a problem hiding this comment.
Yes, I 100% love this idea. This is a very valuable optimization I think (basically 2x/3x speed-up for predictions-dominated models).
Do you think we can decide to go with that? Or should we gather more opinions first?
There was a problem hiding this comment.
I think we can go with it. I don't really see any drawbacks.
There was a problem hiding this comment.
Hum... DummyClassifier(strategy="uniform") the argmax of predict proba is not predict (predictions are random, predict proba is 1/n_classes everywhere), this make a bunch of tests break.
More realistic examples are:
SVC(probability=True): but for this one, we can use the decision functionFixedThresholdClassifierandTunedThresholdClassifierCV(those probably break assumptions made in other places too).
I keep running on such small but assumption/abstraction-breaking edge cases those days 😭
Anyway, what do we do?
- Ignore that and change the tests to avoid
DummyClassifier(strategy="uniform"), but maybe this is on purpose that we have this in the tests. - Implement a special treatment for
DummyClassifier(and maybeFixedThresholdClassifier,TunedThresholdClassifierCV), and usedecision_functionfirst when available (this fixes theSVC(probability=True)case) - Give up on this nice optimization (which also simplifies the code quite a lot honnestly)
I vote for 2 ^^ but I'd like to have @glemaitre opinion on that.
There was a problem hiding this comment.
I think that we can go for the optimization. In short scikit-learn has common test to test this assumption, so most of the classifiers in scikit-learn would benefit for it. However, we cannot provide the same optimization for estimator outside of scikit-learn because we are not aware. In an ideal world, maybe the system of tags could help in this direction but we cannot bet that people implement it.
So in short, we can implement it in another PR where we will have a fallback for *ThresholdClassifier* and non-scikit-learn estimator.
Indeed. Inferring it only for the case {0, 1} and {-1, 1} and crashing otherwise is not a great behavior. Instead we choose to not crash and display/return for both labels. |


Towards #2642 and #2592
Change description
Go from behavior table described in #2592 to this:
This PR also refactors how predictions are made to make those changes easier and too prepare the ground for moving
pos_labelfrom the init to the arguments of metrics/plots. Indeed, now cached predictions don't depend on thepos_labelanymore (we adapt forpos_labelon the fly if needed).By doing so, it fixes #2671 (with the option 1 described in the issue)
And this refactor also fixes #2672
Contribution checklist
AI usage disclosure
For review and tests; not for the core changes, those were a bit too delicate* for AI.
*too delicate = not well-specified enough when I started working on it 😅