Fix/memstreamriver by NicolasNoya · Pull Request #1748 · online-ml/river

NicolasNoya · 2026-01-28T11:46:01Z

Add MemStream: Memory-Based Streaming Anomaly Detection

This PR introduces MemStream, a state-of-the-art online anomaly detection framework designed for high-dimensional data streams with concept drift, based on the paper "MemStream: Memory-Based Streaming Anomaly Detection" by Bhatia et al.

What's New

Core Implementation

MemStream (Base Class): Abstract base class providing the core framework for memory-based anomaly detection
MemStreamPCA: Concrete implementation using PCA-based feature encoding

Architecture

The implementation consists of two main components:

Feature Encoder: Transforms high-dimensional inputs into lower-dimensional representations
- Currently implements PCA-based projection
- Extensible design allows for future encoders (denoising autoencoders and information bottleneck, not implemented due to compatibility issues)
Memory Module: Maintains a dynamic collection of encoded "normal" data representations
- Adapts to concept drift without explicit labels
- Configurable replacement strategies: FIFO, LRU, and Random
- Prevents memory poisoning from anomalous samples

Key Features

Online Learning: Processes data points one at a time
Unsupervised Detection: No labels required during inference (optional during training)
Concept Drift Adaptation: Memory evolves over time to handle distribution changes
Flexible Scoring: Uses k-nearest neighbors with exponential weighting to compute anomaly scores
Grace Period: Collects initial samples to bootstrap the encoder before scoring begins
Memory Management: Configurable size and replacement policies

Parameters

memory_size: Maximum number of encoded normal samples to store (default: 1,000 for PCA variant)
max_threshold: Threshold for accepting samples into memory (default: 0.1)
grace_period: Number of initial samples before scoring begins (default: 5,000)
n_components: Number of PCA components (default: 20) (coded to take the value that makes PCA possible if n_components is inappropriate)
k: Number of nearest neighbors for scoring (default: 5)
gamma: Exponential weighting factor (default: 0.1)
replace_strategy: Memory replacement policy (FIFO, LRU, or RANDOM)

kulbachcedric · 2026-02-05T14:41:28Z

Hi @NicolasNoya,

that's a nice contribution!
However, I think the setup.py creates some issues here.
Can we remove it?

Best
Cedric

… and make it cleaner

NicolasNoya · 2026-02-05T15:40:54Z

Hello @kulbachcedric,

Sorry for the mess. I checked that you've been changing the code, and I rebased it to check if everything worked on the new version.

I do have one question: would you prefer to keep the abstract class MemStream with the MemStreamPCA subclass, or should we keep only MemStreamPCA? Since the autoencoder class is no longer present, it might be cleaner to keep a single class.

Thanks a lot, and I’ll wait for your feedback.

Best,
Nicolás Noya

kulbachcedric · 2026-02-05T15:58:30Z

Hi @NicolasNoya
no worries!
I left you some comments :-)
I think we could remove the MemStream class, as the MemStreamPCA is currently the only class that implements MemStream.

Just another question, would you have an idea to benchmark the Anomaly detection algorithms. This is currently missing within the benchmark in river and deep-river.

Thanks again for your contribution!

Best
Cedric

NicolasNoya · 2026-02-05T16:25:10Z

Hi @kulbachcedric

This weekend, I’ll try something and see how we could benchmark the anomaly detection algorithms within the framework.

I will also update the code based on your comments!

Best,
Nicolás

NicolasNoya · 2026-02-08T17:51:47Z

Hello @kulbachcedric,
I've been working on the code, and I hope you like this version. I made some improvements to function naming and refined a few methods that were a little sloppy.

I also did some research on anomaly detection benchmarking. It seems that the most common metrics are ROC AUC and PR AUC, and I personally like recall on the anomalous class (I think is called sensitivity in anomaly detection). If you’d like, I can start implementing benchmarking for these metrics this week. I can also include memory and runtime measurements, similar to previous River benchmarks.

Please let me know if you spot anything that could be improved in the code, or if anything seems off.

Thanks a lot, and I look forward to your feedback.

Best regards,
Nicolás Noya

kulbachcedric · 2026-02-09T07:27:17Z

Hi @NicolasNoya
nice one!
Actually the changes within the setup.py still appear within the changes.
I would suggest add add docstrings to every function you are adding.

For the Benchmarks, we could create a new issue?

Best
Cedric

kulbachcedric · 2026-02-05T15:50:07Z

river/anomaly/memstream.py

+from river import anomaly, utils
+
+
+class EncoderType:


I think this can be removed

kulbachcedric · 2026-02-05T15:52:19Z

river/anomaly/memstream.py

+    IB = "information_bottleneck"  # TODO: implement
+
+
+class ReplaceStrategy:


@NicolasNoya I would pass strings and comment these string options within the docstrings. This would nicely align with the sklearn principles (see other classes)

kulbachcedric · 2026-02-05T15:53:58Z

river/anomaly/memstream.py

+    RANDOM = "RANDOM"  # Random replacement
+
+
+class MemStream(anomaly.base.AnomalyDetector):


I would remove this class and integrate it into the MemStreamPCA class, as we currently have only on Memstream variant implemented.

@NicolasNoya

kulbachcedric · 2026-02-05T15:54:41Z

setup.py

We can revert this change, I think.

@NicolasNoya

NicolasNoya requested review from MaxHalford and smastelini as code owners January 28, 2026 11:46

NicolasNoya added 13 commits February 5, 2026 16:26

Memstream first stable implementation

49a033b

setup not to merge to the original repo

142c5f3

improve code compliance

299a988

code QA

fb2f15d

Some doc corrections

34be6df

codeqa

f7b0927

Bug fixes

46eac13

formater

7bd7198

add std management

e35abe3

small changes

8a1be8b

small changes

faae91d

Change the architecture of memStream to fit with river's requirements…

9d3321b

… and make it cleaner

code qa

934a28e

NicolasNoya force-pushed the fix/memstreamriver branch from 56e2fdc to 934a28e Compare February 5, 2026 15:30

NicolasNoya added 5 commits February 7, 2026 22:57

Remove abstract class

234adfc

Fix normalization issues

e775fc6

code qa

ef92bc0

Fix LRU

5db57c3

code qa

0cb37c8

kulbachcedric marked this pull request as draft February 9, 2026 07:27

NicolasNoya added 2 commits February 15, 2026 16:04

Adding docstrings

714cf98

code qa

6364d66

NicolasNoya added 2 commits February 22, 2026 12:08

Test

024875f

code qa

943a328

kulbachcedric marked this pull request as ready for review March 17, 2026 12:44

kulbachcedric reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/memstreamriver#1748

Fix/memstreamriver#1748
NicolasNoya wants to merge 22 commits intoonline-ml:mainfrom
NicolasNoya:fix/memstreamriver

NicolasNoya commented Jan 28, 2026

Uh oh!

kulbachcedric commented Feb 5, 2026

Uh oh!

NicolasNoya commented Feb 5, 2026

Uh oh!

kulbachcedric commented Feb 5, 2026

Uh oh!

NicolasNoya commented Feb 5, 2026

Uh oh!

NicolasNoya commented Feb 8, 2026

Uh oh!

kulbachcedric commented Feb 9, 2026

Uh oh!

kulbachcedric Feb 5, 2026

Uh oh!

kulbachcedric Feb 5, 2026

Uh oh!

kulbachcedric Feb 5, 2026

Uh oh!

kulbachcedric Mar 17, 2026

Uh oh!

kulbachcedric Feb 5, 2026

Uh oh!

kulbachcedric Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		IB = "information_bottleneck" # TODO: implement


		class ReplaceStrategy:

		RANDOM = "RANDOM" # Random replacement


		class MemStream(anomaly.base.AnomalyDetector):

Uh oh!

Conversation

NicolasNoya commented Jan 28, 2026

Add MemStream: Memory-Based Streaming Anomaly Detection

What's New

Core Implementation

Architecture

Key Features

Parameters

Uh oh!

kulbachcedric commented Feb 5, 2026

Uh oh!

NicolasNoya commented Feb 5, 2026

Uh oh!

kulbachcedric commented Feb 5, 2026

Uh oh!

NicolasNoya commented Feb 5, 2026

Uh oh!

NicolasNoya commented Feb 8, 2026

Uh oh!

kulbachcedric commented Feb 9, 2026

Uh oh!

kulbachcedric Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

kulbachcedric Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

kulbachcedric Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

kulbachcedric Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

kulbachcedric Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

kulbachcedric Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants