[ENH] Implement outlier detection based on probabilistic regressors by arnavk23 · Pull Request #777 · sktime/skpro

arnavk23 · 2026-02-27T19:51:30Z

Reference Issues/PRs

Fixes #390

What does this implement/fix? Explain your changes.

It introduces three reduction strategies from probabilistic regression to outlier/anomaly detection with a PyOD-compatible interface:

QuantileOutlierDetector - Detects outliers based on predictive quantile extremity. Samples falling outside the expected quantile range (configurable via alpha parameter) are flagged as outliers. The outlier score is computed as the distance from the nearest quantile bound, normalized by the quantile range.
DensityOutlierDetector - Detects outliers based on probability density (negative log-likelihood). Samples with low probability density under the predictive distribution are flagged as outliers. Supports both log-likelihood and raw likelihood scoring via the use_log parameter.
LossOutlierDetector - Detects outliers based on predictive loss. Supports multiple loss functions:
- log_loss: negative log-likelihood (equivalent to density-based)
- crps: Continuous Ranked Probability Score
- interval_score: interval score with configurable coverage
- Custom loss functions via callable

Key features:

PyOD-compatible interface with fit(), predict(), and decision_function() methods
Works with any skpro probabilistic regressor
Configurable contamination parameter for automatic threshold determination
Can be used with both conditional and unconditional distribution estimates

Implementation details:

Base class BaseOutlierDetector provides common functionality
All detectors compute outlier scores during training and use percentile-based thresholds

Does your contribution introduce a new dependency? If yes, which one?

No, this implementation uses only existing dependencies (numpy, pandas, scipy - already required by skpro).

What should a reviewer concentrate their feedback on?

Does the PyOD-compatible interface make sense for skpro users?
Are the outlier scoring methods mathematically sound and properly implemented?
Particularly in _quantile.py and _loss.py where we handle array reshaping and multi-output cases
The CRPS implementation in _loss.py uses a Normal distribution approximation - is this adequate or should we use quantile-based approximation instead?
Are the docstrings clear and comprehensive enough?

Did you add any tests for the change?

Yes, comprehensive test suite added in skpro/outlier/tests/test_outliers.py:

Tests for fitting all three detector types
Tests for prediction and decision_function methods
Tests for different loss functions (log_loss, CRPS, interval_score)
Tests for custom loss functions
Tests for error handling (missing y values)
Integration test verifying all detectors work with the same interface
Tests use @run_test_for_class decorator for proper test discovery

Example Output

The implementation includes a comprehensive example that demonstrates all three detector types on synthetic data with known outliers.
The visualization shows:

Outlier scores from QuantileOutlierDetector and DensityOutlierDetector
Detected outliers in feature space
Performance comparison (precision/recall) across different methods

Any other comments?

The implementation is designed to be extensible - users can easily create custom detectors by subclassing BaseOutlierDetector and implementing _compute_decision_scores().

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. (Title: [ENH] Implement outlier detection based on probabilistic regressors)

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured dependency isolation (N/A - no new soft dependencies)

Implements three types of outlier detectors as requested in issue sktime#390: 1. QuantileOutlierDetector - Detects outliers based on predictive quantile extremity. Samples falling outside the expected quantile range are flagged as outliers. 2. DensityOutlierDetector - Detects outliers based on probability density. Samples with low density (high negative log-likelihood) are flagged as outliers. 3. LossOutlierDetector - Detects outliers based on predictive loss. Supports multiple loss functions: log_loss, CRPS, interval_score, and custom losses. Key features: - PyOD-compatible interface with fit(), predict(), and decision_function() methods - Works with any skpro probabilistic regressor - Configurable contamination parameter for threshold determination - Comprehensive test suite - Example demonstrating usage with various regressors Resolves sktime#390

arnavk23 · 2026-03-10T13:02:25Z

@fkiraly @marrov Could you please review this pr? I think this is entirely done here.

marrov · 2026-03-10T13:49:07Z

@arnavk23 - Pretty large PR, nothing jumps at me after a quick scan. I'll trigger the LLM review and come back to it when I have some bandwidth (maybe next week).

Copilot

Pull request overview

Adds a new skpro.outlier module that implements outlier/anomaly detection by “reducing” the task to probabilistic regression, exposing a PyOD-like interface (fit, decision_function, predict).

Changes:

Introduces BaseOutlierDetector plus three detector implementations: quantile-, density-, and loss-based.
Adds documentation/API reference entries and a runnable example script.
Adds a new test suite covering detectors’ core interface and key options.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
`skpro/outlier/base.py`	Adds shared PyOD-like base class and thresholding logic.
`skpro/outlier/_quantile.py`	Implements quantile-interval based scoring logic.
`skpro/outlier/_density.py`	Implements likelihood / negative-log-likelihood based scoring.
`skpro/outlier/_loss.py`	Implements loss-based scoring incl. log-loss, CRPS, interval score, custom loss.
`skpro/outlier/__init__.py`	Exposes new detectors via package exports.
`skpro/outlier/tests/test_outliers.py`	Adds unit/integration tests for the new detectors.
`skpro/outlier/tests/__init__.py`	Initializes tests package.
`docs/source/api_reference/outlier.rst`	Documents new outlier module and classes in API reference.
`docs/source/api_reference.rst`	Links outlier API reference into master index.
`examples/outlier_detection_example.py`	Adds end-to-end usage example + optional visualization.
`.all-contributorsrc`	Adds contributor entry for the PR author.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- keep fallback CRPS sample/output structure intact and reduce per sample - normalize callable loss outputs to one score per sample, with clear shape errors for invalid returns - make BaseOutlierDetector y normalization consistent with y_inner_mtype by using DataFrame internally in fit and decision_function - clarify base outlier detector docs: supervised by default, y=None only for pre-fitted regressors with X-only scoring implementations - add regression tests for CRPS fallback shapes, callable loss reduction and validation, and canonical DataFrame y normalization - fix malformed JSON in .all-contributorsrc and remove an unused import from the outlier detection example

Formats skpro/outlier/_quantile.py to match the repository black hook output so pre-commit and CI pass cleanly.

…com/arnavk23/skpro into fix/issue-390-outlier-detection

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…com/arnavk23/skpro into fix/issue-390-outlier-detection

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

docs/source/api_reference.rst:24

The toctree lists api_reference/tags twice. This can lead to duplicate entries / Sphinx warnings and is likely unintended; remove the duplicate (keep a single api_reference/tags entry).

.. toctree::
    :maxdepth: 1

    api_reference/tags
    api_reference/regression
    api_reference/survival
    api_reference/outlier
    api_reference/distributions
    api_reference/metrics
    api_reference/tags
    api_reference/base

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Replaced brittle substring-based lower/upper extraction in _compute_interval_score with a dedicated helper _extract_interval_bounds. _extract_interval_bounds now: Prefers MultiIndex selection via: .xs("lower", level=-1, axis=1) .xs("upper", level=-1, axis=1) Falls back to old formats only when needed (substring columns, then split-half fallback). Supports array-like interval outputs for 2D/3D layouts. Added explicit shape validation in _compute_interval_score to ensure lower/upper align with y_true sample/output dimensions; raises a clear ValueError on mismatch.

…com/arnavk23/skpro into fix/issue-390-outlier-detection

arnavk23 requested review from felipeangelimvieira and fkiraly as code owners February 27, 2026 19:51

arnavk23 added 2 commits February 28, 2026 01:25

example comment redone

9cf795d

arnavk23 force-pushed the fix/issue-390-outlier-detection branch from 9870c21 to 9cf795d Compare February 27, 2026 20:07

arnavk23 added 3 commits February 28, 2026 01:41

Merge branch 'main' into fix/issue-390-outlier-detection

25f8c42

Add get_test_params methods to outlier detectors and fix doctests

d344f20

Fix import ordering in get_test_params methods

bee2df3

arnavk23 mentioned this pull request Mar 6, 2026

Mathematical Notes for Bayesian Updates and Posterior Diagnostics in skpro #808

Closed

marrov requested a review from Copilot March 10, 2026 13:50

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Copilot started reviewing on behalf of marrov March 10, 2026 13:59 View session

Apply suggestions from code review

007c8c4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

arnavk23 marked this pull request as draft March 10, 2026 17:20

arnavk23 added 9 commits March 14, 2026 00:29

Merge branch 'main' into fix/issue-390-outlier-detection

2111429

STYLE: apply black formatting to quantile outlier detector

e5c234b

Formats skpro/outlier/_quantile.py to match the repository black hook output so pre-commit and CI pass cleanly.

Merge branch 'fix/issue-390-outlier-detection' of https://www.github.…

de5a852

…com/arnavk23/skpro into fix/issue-390-outlier-detection

float computation of CRPS for Gaussian predictive distributions

b75f41a

dtype handling for quantile level detection in _quantile.py

6e3bbc2

format

d36cd12

quantile refactoring

b52b1c0

format - 2

c7eab6c

arnavk23 marked this pull request as ready for review March 13, 2026 21:01

arnavk23 requested a review from Copilot April 11, 2026 14:13

Copilot started reviewing on behalf of arnavk23 April 11, 2026 14:13 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

arnavk23 and others added 4 commits April 12, 2026 03:19

Apply suggestions from code review

4af7b8d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

copilot suggestions

174e1d9

Merge branch 'fix/issue-390-outlier-detection' of https://www.github.…

ba907fd

…com/arnavk23/skpro into fix/issue-390-outlier-detection

black

775a80e

arnavk23 requested a review from Copilot April 11, 2026 22:17

Copilot started reviewing on behalf of arnavk23 April 11, 2026 22:20 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Comment thread skpro/outlier/base.py Outdated

Comment thread skpro/outlier/_quantile.py Outdated

Comment thread skpro/outlier/_quantile.py Outdated

Comment thread skpro/outlier/_loss.py Outdated

Comment thread examples/outlier_detection_example.py Outdated

Comment thread .all-contributorsrc

arnavk23 and others added 3 commits April 12, 2026 04:31

Apply suggestions from code review

8977cec

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'fix/issue-390-outlier-detection' of https://www.github.…

81d1d7f

…com/arnavk23/skpro into fix/issue-390-outlier-detection

Conversation

arnavk23 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Example Output

Any other comments?

PR checklist

For all contributions

For new estimators

Uh oh!

arnavk23 commented Mar 10, 2026

Uh oh!

marrov commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arnavk23 commented Feb 27, 2026 •

edited

Loading