Real-time HTTP traffic anomaly detection and automatic IP blocking for cloud.ng β HNG's Nextcloud-powered cloud storage platform.
| Server IP | YOUR_SERVER_IP |
| Metrics Dashboard | http://YOUR_SERVER_IP:8080 |
| Nextcloud | http://YOUR_SERVER_IP |
| GitHub | https://github.com/YOUR_USERNAME/hng-anomaly-detector |
| Blog Post | https://YOUR_LINKEDIN_POST_URL |
Python was chosen for four specific reasons.
asyncio for true concurrency. The detector has to do several things at once β tail a log file, serve a web dashboard, send HTTP requests to Slack, and run iptables commands. Python's asyncio lets one process do all of this concurrently without threads. Each task yields control when it is waiting for I/O, so nothing blocks anything else.
collections.deque for the sliding window. Python's built-in deque gives O(1) append on the right and O(1) pop from the left. This is exactly the operation pattern a sliding window needs β append new timestamps on arrival, evict old timestamps from the left on each read. No external library required.
Pure math for detection. Mean, standard deviation, and z-score are five lines of arithmetic. No machine learning library, no statistical package. The logic is transparent, auditable, and has zero dependencies beyond the standard library.
aiohttp for the dashboard. A single async HTTP server runs inside the same process as the detector, serving the live metrics page without blocking the detection loop. No separate web server process needed.
A simple per-minute counter resets every 60 seconds. If an attacker sends 500 requests in the last 2 seconds of one minute and 500 more in the first 2 seconds of the next, the counter sees 500 each time β it misses the 1000-request burst entirely.
Instead of counting, the tracker stores the actual timestamp of every request in a collections.deque:
from collections import defaultdict, deque
# One deque per IP address β stores timestamps of requests
_ip_reqs: Dict[str, deque] = defaultdict(deque)
# One global deque β stores timestamps of all requests
_global_reqs: deque = deque()Every time a new log entry arrives, its timestamp is appended to the right of the relevant deque:
async def record(self, entry: LogEntry):
ts = entry.timestamp
ip = entry.source_ip
self._ip_reqs[ip].append(ts) # append newest timestamp on the right
self._global_reqs.append(ts) # same for global windowOn every rate check, timestamps older than 60 seconds are removed from the left of the deque:
def _evict(self, dq: deque, cutoff: float):
while dq and dq[0] < cutoff:
dq.popleft() # remove expired entries from the leftThis works because timestamps are always appended in chronological order β newest on the right, oldest on the left. The deque is always sorted. Eviction is O(k) where k is the number of expired entries, which is typically zero or very small.
After eviction, the rate is simply the count of remaining timestamps divided by the window size:
async def get_ip_rate(self, ip: str) -> float:
now = time.time()
cutoff = now - 60 # 60 seconds ago
dq = self._ip_reqs[ip]
self._evict(dq, cutoff) # remove expired entries first
return len(dq) / 60 # requests per secondThe tracker maintains two separate sets of deques:
| Window | What it tracks | What it catches |
|---|---|---|
| Per-IP | Timestamps per source IP | Single aggressive IP |
| Global | Timestamps across all IPs | Distributed attack from many IPs |
The baseline answers one question: what does normal traffic look like right now?
Without a baseline, the detector would need a hardcoded threshold β which breaks the moment traffic patterns change. A baseline that learns from real traffic adapts automatically to time of day, day of week, and organic growth.
The baseline engine keeps a rolling 30-minute window of per-second traffic counts:
# Each entry is one second of traffic: (timestamp, request_count, error_count)
self._rolling: deque = deque()
# Entries older than 30 minutes are evicted from the left
cutoff = now - 1800
while self._rolling and self._rolling[0][0] < cutoff:
self._rolling.popleft()Every 60 seconds, mean and standard deviation are recomputed from all entries in the rolling window:
counts = [entry[1] for entry in self._rolling]
n = len(counts)
mean = sum(counts) / n
variance = sum((x - mean) ** 2 for x in counts) / n
stddev = math.sqrt(variance)The engine also maintains per-calendar-hour buckets. If the current hour has at least 60 seconds of data, it is preferred over the 30-minute rolling window. This handles diurnal patterns β 3am traffic looks very different from 2pm traffic.
At startup there is not enough data to compute a meaningful baseline. Floor values prevent false positives during this period:
baseline:
floor_mean: 0.1 # minimum effective mean (req/s)
floor_stddev: 0.05 # minimum effective stddev| Source | Used when | Accuracy |
|---|---|---|
floor |
Startup, not enough data | Safe default |
rolling_30min |
Less than 60 seconds of current hour data | Good |
current_hour |
Current hour has 60+ seconds of data | Best |
1. Log into AWS Console
Go to https://console.aws.amazon.com and sign in.
2. Navigate to EC2
Click Services β EC2 β Launch Instance
3. Configure the instance
Fill in the following:
| Setting | Value |
|---|---|
| Name | hng-anomaly-detector |
| AMI | Ubuntu Server 22.04 LTS (HVM) β search for it under Quick Start |
| Architecture | 64-bit (x86) |
| Instance type | t3.small (2 vCPU, 2 GB RAM) β meets the minimum requirement |
| Key pair | Create a new key pair β name it hng-key, download the .pem file and save it somewhere safe |
4. Configure network settings
Click Edit next to Network settings and add the following inbound rules:
| Type | Protocol | Port | Source | Purpose |
|---|---|---|---|---|
| SSH | TCP | 22 | My IP | Connect to the server |
| HTTP | TCP | 80 | Anywhere | Nextcloud access |
| Custom TCP | TCP | 8080 | Anywhere | Metrics dashboard |
5. Configure storage
Set root volume to 20 GB (the default 8 GB fills up fast with Docker images).
6. Launch the instance
Click Launch Instance. Wait about 60 seconds for it to reach the running state.
7. Get your public IP
Click on your instance in the EC2 dashboard. Copy the Public IPv4 address β this is your SERVER_IP.
On Mac / Linux:
# Fix key file permissions (required by SSH)
chmod 400 ~/Downloads/hng-key.pem
# Connect
ssh -i ~/Downloads/hng-key.pem ubuntu@YOUR_SERVER_IPOn Windows (using PowerShell):
# Fix key file permissions
icacls "C:\Users\YourName\Downloads\hng-key.pem" /inheritance:r /grant:r "$($env:USERNAME):(R)"
# Connect
ssh -i "C:\Users\YourName\Downloads\hng-key.pem" ubuntu@YOUR_SERVER_IPYou should now be inside the EC2 instance.
Run these commands one by one inside the server:
# Update package list
sudo apt-get update && sudo apt-get upgrade -y
# Install Docker using the official script
curl -fsSL https://get.docker.com | sudo sh
# Add ubuntu user to the docker group (avoids needing sudo for every docker command)
sudo usermod -aG docker ubuntu
# Apply the group change without logging out
newgrp docker
# Install the Docker Compose plugin
sudo apt-get install -y docker-compose-plugin
# Verify both are installed correctly
docker --version
docker compose versionExpected output:
Docker version 24.x.x
Docker Compose version v2.x.x
# Install git if not already present
sudo apt-get install -y git
# Clone your project
git clone https://github.com/YOUR_USERNAME/hng-anomaly-detector.git
# Enter the project folder
cd hng-anomaly-detector# Create your .env from the template
cp .env.example .env
# Open it for editing
nano .envFill in every value:
# Your EC2 public IP (copy from AWS console)
SERVER_IP=YOUR_SERVER_IP
# Nextcloud admin account
# This account is created automatically on first boot
NC_ADMIN=admin
NC_ADMIN_PASS=ChooseAStrongPasswordHere
# PostgreSQL database credentials
# These are used by both the database and Nextcloud β they must match
POSTGRES_DB=nextcloud
POSTGRES_USER=nextcloud
POSTGRES_PASSWORD=ChooseAStrongDBPasswordHere
# Slack webhook URL
# How to get this:
# 1. Go to https://api.slack.com/apps
# 2. Create New App β From scratch
# 3. Incoming Webhooks β Activate β Add New Webhook
# 4. Pick your channel β Copy the webhook URL
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URLSave with Ctrl+O, Enter, Ctrl+X.
# Create the audit log directory on the host
# The detector writes structured log entries here
sudo mkdir -p /var/log/detector
sudo chown ubuntu:ubuntu /var/log/detectordocker compose up -d --buildDocker will now:
- Pull
postgres:15-alpinefrom DockerHub - Pull
kefaslungu/hng-nextcloud:latestfrom DockerHub - Pull
nginx:1.25-alpinefrom DockerHub - Build the detector image from
detector/Dockerfile - Start all four containers in the correct order
- Create the
HNG-nginx-logsshared Docker volume
This takes 2-4 minutes on first run depending on your connection speed.
# All 4 containers should show "running" or "healthy"
docker compose psExpected output:
NAME STATUS
hng-db running (healthy)
hng-nextcloud running (healthy)
hng-nginx running
hng-detector running (healthy)
If any container shows exited, check its logs:
docker compose logs db
docker compose logs nextcloud
docker compose logs nginx
docker compose logs detectorConfirm the detector is reading logs:
docker compose logs -f detectorYou should see:
[INFO] main: HNG Anomaly Detection Engine starting up
[INFO] monitor: LogMonitor starting β watching /var/log/nginx/hng-access.log
[INFO] dashboard: Dashboard running at http://0.0.0.0:8080/
[INFO] baseline: Baseline recalculated β source=floor mean=0.100
Confirm Nginx is writing JSON logs:
docker compose exec nginx tail -f /var/log/nginx/hng-access.logConfirm the dashboard API responds:
curl http://localhost:8080/api/metricsOpen your browser:
| Service | URL |
|---|---|
| Live metrics dashboard | http://YOUR_SERVER_IP:8080 |
| Nextcloud | http://YOUR_SERVER_IP |
Nextcloud takes 60-90 seconds on first boot to set up its database. If you see a loading screen, wait and refresh.
Log in with the NC_ADMIN and NC_ADMIN_PASS values from your .env file.
Send normal traffic for 3 minutes so the detector learns what normal looks like before you test detection:
for i in $(seq 1 180); do
curl -s http://localhost/ > /dev/null
sleep 1
doneWatch for this in the detector logs:
Baseline recalculated β source=rolling_30min mean=0.850 stddev=0.210
Once you see source=rolling_30min, the baseline is built and the detector is ready.
# Stop everything
docker compose down
# Restart just the detector after a config change
docker compose restart detector
# Rebuild after code changes
docker compose up -d --build
# Live detector logs
docker compose logs -f detector
# View the audit log
cat /var/log/detector/audit.log
# Check active iptables bans
sudo iptables -L INPUT -n --line-numbers
# Check disk space (important on EC2)
df -hWhen the 12-hour grading window is over, stop the instance to avoid ongoing charges:
- Go to EC2 β Instances
- Select your instance
- Click Instance State β Stop
Important: Stopping preserves your data. Terminating deletes everything permanently.
hng-anomaly-detector/
βββ docker-compose.yml β wires all 4 services together
βββ .env β your secrets (never commit this)
βββ .env.example β template with placeholder values
βββ .gitignore
βββ README.md
β
βββ detector/ β the daemon (only service with a Dockerfile)
β βββ Dockerfile
β βββ requirements.txt β pyyaml, aiohttp, psutil
β βββ config.yaml β all thresholds in one place
β βββ main.py β entry point, wires everything together
β βββ monitor.py β tails and parses the Nginx log
β βββ baseline.py β rolling 30-min baseline engine
β βββ detector.py β sliding windows + anomaly detection
β βββ blocker.py β iptables ban management
β βββ unbanner.py β auto-releases expired bans
β βββ notifier.py β Slack alerts
β βββ dashboard.py β live metrics web UI on port 8080
β
βββ nginx/
β βββ nginx.conf β JSON logs + X-Forwarded-For
β
βββ docs/
β βββ architecture.png
β
βββ screenshots/
βββ Tool-running.png
βββ Ban-slack.png
βββ Unban-slack.png
βββ Global-alert-slack.png
βββ Iptables-banned.png
βββ Audit-log.png
βββ Baseline-graph.png
All thresholds live in detector/config.yaml. Nothing is hardcoded in Python.
| Key | Default | What it controls |
|---|---|---|
sliding_window.seconds |
60 | How far back the deque window looks |
baseline.rolling_window_minutes |
30 | Rolling window for baseline computation |
baseline.recalc_interval_seconds |
60 | How often baseline is recalculated |
baseline.floor_mean |
0.1 | Minimum baseline mean |
baseline.floor_stddev |
0.05 | Minimum baseline stddev |
anomaly.z_score_threshold |
3.0 | Z-score to trigger an alert |
anomaly.rate_multiplier |
5.0 | Rate Γ mean to trigger an alert |
anomaly.error_rate_multiplier |
3.0 | Error surge detection multiplier |
anomaly.error_tightening |
0.5 | Z threshold multiplier during error surge |
blocking.ban_schedule_minutes |
[10,30,120,-1] | Progressive ban durations |
dashboard.port |
8080 | Dashboard port |
Every ban, unban, and baseline recalculation is written to /var/log/detector/audit.log:
[2026-04-29T10:00:01Z] BASELINE_RECALC - | source=floor | mean=0.1000 | stddev=0.0500 | samples=0
[2026-04-29T10:01:01Z] BASELINE_RECALC - | source=rolling_30min | mean=0.8500 | stddev=0.2100 | samples=60
[2026-04-29T10:05:00Z] BAN 203.0.113.99 | z-score=8.42 > 3.0 | rate=47.320 | baseline=0.850 | duration=10min
[2026-04-29T10:15:05Z] UNBAN 203.0.113.99 | was_level=0 | elapsed=10.1min | original_condition=z-score=8.42 > 3.0