Conversation
|
Hi @NicolasNoya, that's a nice contribution! Best |
… and make it cleaner
56e2fdc to
934a28e
Compare
|
Hello @kulbachcedric, Sorry for the mess. I checked that you've been changing the code, and I rebased it to check if everything worked on the new version. I do have one question: would you prefer to keep the abstract class MemStream with the MemStreamPCA subclass, or should we keep only MemStreamPCA? Since the autoencoder class is no longer present, it might be cleaner to keep a single class. Thanks a lot, and I’ll wait for your feedback. Best, |
|
Hi @NicolasNoya Just another question, would you have an idea to benchmark the Anomaly detection algorithms. This is currently missing within the benchmark in river and deep-river. Thanks again for your contribution! Best |
|
This weekend, I’ll try something and see how we could benchmark the anomaly detection algorithms within the framework. I will also update the code based on your comments! Best, |
|
Hello @kulbachcedric, I also did some research on anomaly detection benchmarking. It seems that the most common metrics are ROC AUC and PR AUC, and I personally like recall on the anomalous class (I think is called sensitivity in anomaly detection). If you’d like, I can start implementing benchmarking for these metrics this week. I can also include memory and runtime measurements, similar to previous River benchmarks. Please let me know if you spot anything that could be improved in the code, or if anything seems off. Thanks a lot, and I look forward to your feedback. Best regards, |
|
Hi @NicolasNoya For the Benchmarks, we could create a new issue? Best |
river/anomaly/memstream.py
Outdated
| from river import anomaly, utils | ||
|
|
||
|
|
||
| class EncoderType: |
There was a problem hiding this comment.
I think this can be removed
| IB = "information_bottleneck" # TODO: implement | ||
|
|
||
|
|
||
| class ReplaceStrategy: |
There was a problem hiding this comment.
@NicolasNoya I would pass strings and comment these string options within the docstrings. This would nicely align with the sklearn principles (see other classes)
| RANDOM = "RANDOM" # Random replacement | ||
|
|
||
|
|
||
| class MemStream(anomaly.base.AnomalyDetector): |
There was a problem hiding this comment.
I would remove this class and integrate it into the MemStreamPCA class, as we currently have only on Memstream variant implemented.
There was a problem hiding this comment.
We can revert this change, I think.
Add MemStream: Memory-Based Streaming Anomaly Detection
This PR introduces MemStream, a state-of-the-art online anomaly detection framework designed for high-dimensional data streams with concept drift, based on the paper "MemStream: Memory-Based Streaming Anomaly Detection" by Bhatia et al.
What's New
Core Implementation
MemStream(Base Class): Abstract base class providing the core framework for memory-based anomaly detectionMemStreamPCA: Concrete implementation using PCA-based feature encodingArchitecture
The implementation consists of two main components:
Feature Encoder: Transforms high-dimensional inputs into lower-dimensional representations
Memory Module: Maintains a dynamic collection of encoded "normal" data representations
Key Features
Parameters
memory_size: Maximum number of encoded normal samples to store (default: 1,000 for PCA variant)max_threshold: Threshold for accepting samples into memory (default: 0.1)grace_period: Number of initial samples before scoring begins (default: 5,000)n_components: Number of PCA components (default: 20) (coded to take the value that makes PCA possible if n_components is inappropriate)k: Number of nearest neighbors for scoring (default: 5)gamma: Exponential weighting factor (default: 0.1)replace_strategy: Memory replacement policy (FIFO, LRU, or RANDOM)