Skip to content

[ENH] Support for discrete output distributions and probabilistic classification #1003

@fkiraly

Description

@fkiraly

Even though scikit-learn supports probabilistic classification and, thus, discrete output distributions, it has a few problems:

  • the predict_proba API is inconsistent with the skpro regression API, as it returns a numpy array and not a distribution object
  • there is no API for multivariate probabilistic classification, or ordinal regression.

It was suggested by @felipeangelimvieira to add a skpro native probabilistic classifier API that can represent a wider range of return distributions.

Someone working on this should give more details about the API design first.

As a starting point, I suggest:

  • a new module classification
  • a base class BaseProbaClassifier with methods fit, predict and predict_proba
  • if ordinal classification is also covered, then predict_quantiles and predict_interval may also make sense, in this case a capability tag for ordinal classification
  • a distribution DiscreteClass used for the output
  • a scikit-learn adapter that allows to expose any sklearn probabilistic classifier under the modified API

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions