GopherProxy & Sentinel

A production-grade reverse proxy with dynamic service discovery, per-IP rate limiting, and full observability - built in Go.

Overview

GopherProxy is a reverse proxy (Data Plane) that load-balances traffic across a dynamic pool of backends using round-robin. Its companion, Sentinel (Control Plane), continuously TCP-probes each backend and updates a Redis Set as the live source of truth. The proxy polls Redis every 5 seconds and updates its pool — no restarts required when backends come or go.

The entire stack ships as a single docker compose up with Prometheus scraping metrics every 15 s and a pre-provisioned Grafana dashboard that auto-loads on first boot.

Architecture

flowchart LR
    Clients[HTTP Clients] --> Proxy[GopherProxy :8080\nRate limit + round-robin]
    Proxy --> B1[Backend :8081]
    Proxy --> B2[Backend :8082]
    Proxy --> B3[Backend :8083]

    Sentinel[Sentinel\nTCP health checks] --> Redis[(Redis\ngopher_backends)]
    Redis --> Proxy

    B1 --> Sentinel
    B2 --> Sentinel
    B3 --> Sentinel

    Proxy --> Metrics[Prometheus metrics :2112/metrics]
    Metrics --> Prometheus[Prometheus]
    Prometheus --> Grafana[Grafana Dashboard]

    classDef data fill:#e8f4ff,stroke:#1f6feb,color:#0b1f33;
    classDef control fill:#fff4e5,stroke:#d97706,color:#4a2c0a;
    classDef storage fill:#eefbf0,stroke:#16a34a,color:#11331a;
    classDef obs fill:#f3e8ff,stroke:#9333ea,color:#2d113f;

    class Proxy,B1,B2,B3,Metrics data;
    class Sentinel control;
    class Redis storage;
    class Prometheus,Grafana obs;

The Mermaid diagram above shows the data plane, control plane, registry, and observability stack in one view.

Components

Service	Role	Ports (host)
GopherProxy	Reverse proxy · rate limiting · metrics	`8080` (proxy), `2112` (metrics)
Sentinel	TCP health prober · Redis registry writer	— (internal)
Redis	Live backend set (`gopher_backends`)	`16379` → `6379`
Prometheus	Scrapes `/metrics` · stores 15 days of data	`9090`
Grafana	Pre-provisioned dashboard · no manual setup	`3000`

Features

Proxy (Data Plane)

Round-robin load balancing with dead-backend skip — sync/atomic counter + sync.RWMutex pool
Per-IP rate limiting (Token Bucket) — each client IP gets its own isolated rate.Limiter via sync.Map; burst of 5, replenishes at 2 req/s
Dynamic backend pool — polls Redis SMembers every 5 s, registers new backends without restart
Local TCP health check — probes every 10 s with context-aware DialContext; logs state transitions (recovered / went down)
Structured JSON logging (log/slog) — every request logged with method, path, remote_addr, status, duration_ms
Graceful shutdown — 30 s drain on SIGTERM/SIGINT, cancels root context, drains both servers, closes Redis client
Dedicated metrics server — /metrics and /healthz on a separate port (:2112), never reachable through the proxy path

Sentinel (Control Plane)

TCP probe loop — checks each backend every 2 s with a 1 s dial timeout
Idempotent registry — SAdd on UP, SRem on DOWN; only logs on actual state transitions
Runtime target override — SENTINEL_TARGETS env var (comma-separated host:port) replaces hard-coded defaults without rebuilding

Operations

4 Prometheus metrics —
- gopherproxy_processed_requests_total, gopherproxy_dropped_requests_total, gopherproxy_active_backends, gopherproxy_request_duration_seconds histogram (per-backend label)
7-panel Grafana dashboard — auto-provisioned; request rate, p50/p95/p99 latency, active backends gauge + timeseries, totals, success rate %
Non-root containers — fixed UID/GID 1001 (gopheruser:gophergroup); HEALTHCHECK on every service
Multi-stage Docker build — CGO_ENABLED=0, -trimpath, -s -w; final image ≈ 13 MB
Version stamped at build time — Version and Commit injected via -ldflags
Resource limits on every service — CPU and memory capped in docker-compose.yml
Redis persistence — appendonly yes, appendfsync everysec, maxmemory 128 MB (LRU eviction)

Quick Start

Prerequisites

Tool	Version
Docker + Docker Compose	v2+
Python 3	any (for mock backends)
Go	1.24+ (tests only)
`make`	optional

1 — Configure environment

cp .env.example .env

Edit .env — at minimum change GRAFANA_PASSWORD:

VERSION=1.0.0
GRAFANA_USER=admin
GRAFANA_PASSWORD=changeme          # ← change this
SENTINEL_TARGETS=host.docker.internal:8081,host.docker.internal:8082,host.docker.internal:8083

2 — Build & start the stack

make up
# expands to: VERSION=... COMMIT=$(git rev-parse --short HEAD) docker compose up --build -d

Wait ~15 s for all health checks to pass:

gopher-redis       (healthy) ✔
gopher-proxy       (healthy) ✔
gopher-sentinel    (healthy) ✔
gopher-prometheus  up        ✔
gopher-grafana     up        ✔

3 — Start mock backends (separate terminal)

make mock-backends
# Starts Python HTTP servers on :8081  :8082  :8083

Sentinel detects them within 2 s and registers them in Redis. The proxy picks them up within the next 5 s poll.

4 — Send traffic

# 30 requests, 1/s — stays under rate limit, round-robins across all three backends
for i in $(seq 1 30); do curl -s http://localhost:8080/; sleep 1; done

You will see each response coming from a different backend:

<h1>GopherProxy Demo: Response from SERVER 1</h1>
<h1>GopherProxy Demo: Response from SERVER 2</h1>
<h1>GopherProxy Demo: Response from SERVER 3</h1>
...

Endpoints

URL	Description
`http://localhost:8080`	Proxy — load-balanced entry point
`http://localhost:2112/healthz`	Liveness probe — returns `ok`
`http://localhost:2112/metrics`	Raw Prometheus metrics (text format)
`http://localhost:9090`	Prometheus expression browser
`http://localhost:9090/targets`	Scrape target health
`http://localhost:3000`	Grafana — GopherProxy Dashboard

Grafana login: admin / admin (or whatever is set in .env)

Rate Limiting

The proxy uses a per-IP Token Bucket limiter:

Parameter	Value
Burst	5 requests
Replenish rate	1 token per 500 ms (2 req/s)
Response when exceeded	`429 Too Many Requests`
Counter metric	`gopherproxy_dropped_requests_total`

Each client IP gets its own isolated bucket stored in a sync.Map — a busy IP cannot affect others.

Prometheus Metrics

All metrics are prefixed gopherproxy_ and scraped from :2112/metrics:

Metric	Type	Description
`gopherproxy_processed_requests_total`	Counter	Requests successfully forwarded to a backend
`gopherproxy_dropped_requests_total`	Counter	Requests dropped (rate-limited or no healthy backend)
`gopherproxy_active_backends`	Gauge	Number of backends currently marked alive
`gopherproxy_request_duration_seconds`	Histogram	Latency per backend — labelled `backend="host:port"`

Useful PromQL queries

# Request throughput (req/s)
rate(gopherproxy_processed_requests_total[1m])

# p50 / p95 / p99 latency per backend
histogram_quantile(0.99, rate(gopherproxy_request_duration_seconds_bucket[1m]))

# Drop rate (rate-limited req/s)
rate(gopherproxy_dropped_requests_total[1m])

# Success rate %
rate(gopherproxy_processed_requests_total[1m])
  /
(rate(gopherproxy_processed_requests_total[1m]) + rate(gopherproxy_dropped_requests_total[1m]))
* 100

# How many backends are alive
gopherproxy_active_backends

Grafana Dashboard

The dashboard (grafana/provisioning/dashboards/gopherproxy.json) is auto-loaded on first boot — no manual import needed.

Panel	Query
Request Rate	`rate(gopherproxy_processed_requests_total[1m])` + dropped overlay
Latency Percentiles	`histogram_quantile(0.50/0.95/0.99, ...)` per backend
Active Backends (gauge)	`gopherproxy_active_backends`
Backend Health Over Time	Same gauge as timeseries
Total Processed	`gopherproxy_processed_requests_total`
Total Dropped	`gopherproxy_dropped_requests_total`
Success Rate %	processed / (processed + dropped) × 100

Set the time range to Last 5 minutes and auto-refresh to 10 s while traffic flows to see all panels update live.

Environment Variables

GopherProxy

Variable	Default	Description
`REDIS_URL`	`localhost:16379`	Redis address (`host:port`)
`PROXY_PORT`	`8080`	Port the proxy listens on
`METRICS_PORT`	`2112`	Port for `/metrics` and `/healthz`

Sentinel

Variable	Default	Description
`REDIS_URL`	`localhost:16379`	Redis address (`host:port`)
`SENTINEL_TARGETS`	`host.docker.internal:808{1,2,3}`	Comma-separated `host:port` list to monitor

Development

Run tests

make test
# go test -race -count=1 ./...

22 unit tests across both packages — all run with the race detector:

Package Tests

gopherproxy TestGetNextPeer_EmptyPool, TestGetNextPeer_AllDead, TestGetNextPeer_RoundRobin, TestGetNextPeer_SkipsDeadBackend, TestBackend_SetAndIsAlive, TestUpdateServerPool_AddNew, TestUpdateServerPool_NoDuplicates, TestUpdateServerPool_InvalidURL, TestIPRateLimiter_AllowsBurst, TestIPRateLimiter_BlocksAfterBurst, TestIPRateLimiter_IsolatesIPs, TestLoggingMiddleware_PassesThrough, TestLimitMiddleware_Allows, TestLimitMiddleware_Blocks, TestGetEnv_UsesDefault, TestGetEnv_UsesEnvVar, TestServerPool_AtomicConcurrency

sentinel TestGetEnv_Default, TestGetEnv_FromEnvironment, TestGetTargets_Defaults, TestGetTargets_FromEnv, TestGetTargets_SingleEntry

Lint

make lint   # requires golangci-lint

Other Makefile targets

make help           # list all targets
make build          # build images only (no start)
make logs           # tail all container logs
make down           # stop stack, keep volumes
make clean          # prune dangling images + build cache
make tidy           # go mod tidy && go mod verify

Hot-reload Prometheus config

curl -s -X POST http://localhost:9090/-/reload

Project Structure

.
├── main.go                        # GopherProxy — Data Plane
├── main_test.go                   # 17 proxy unit tests
├── sentinel/
│   ├── main.go                    # Sentinel — Control Plane
│   └── main_test.go               # 5 sentinel unit tests
├── Dockerfile                     # Multi-stage proxy image
├── Dockerfile.sentinel            # Multi-stage sentinel image
├── docker-compose.yml             # Full stack (5 services)
├── prometheus.yml                 # Scrape config + relabelling
├── Makefile                       # Developer targets
├── .env.example                   # Environment variable template
├── .dockerignore                  # Keeps build context lean
├── go.mod / go.sum                # Module: github.com/YogeshT22/gopherproxy.git
├── grafana/
│   └── provisioning/
│       ├── datasources/
│       │   └── datasource.yml     # Prometheus datasource (uid: prometheus)
│       └── dashboards/
│           ├── dashboard.yml      # Dashboard provider config
│           └── gopherproxy.json   # 7-panel pre-built dashboard
└── mock_backends/
    ├── server1/index.html         # Mock on :8081
    ├── server2/index.html         # Mock on :8082
    └── server3/index.html         # Mock on :8083

Dependencies

Package	Version	Purpose
`github.com/prometheus/client_golang`	v1.23.2	Metrics instrumentation
`github.com/redis/go-redis/v9`	v9.17.2	Redis client
`golang.org/x/time`	v0.14.0	Token bucket rate limiter

Load Testing with k6

This project includes k6 load test scenarios to validate performance claims. k6 is a modern, scriptable load testing tool that measures throughput, latency percentiles, and error rates.

Prerequisites

Install k6:

# macOS
brew install k6

# Windows (using chocolatey)
choco install k6

# Linux
sudo apt-get install k6

# or download from https://k6.io/docs/getting-started/installation

Running Load Tests

Note: Keep the full stack running (make up + make mock-backends) in one terminal before running k6 tests.

1. Smoke Test (quick check — 30s)

make k6-smoke
# 10 concurrent users for 30 seconds
# ✔ Verifies proxying is working
# ✔ Baseline latency measurement

2. Full Load Test (5–6 minutes)

make k6-load
# Ramps up: 0 → 100 → 300 → 500 users
# Sustains 500 concurrent users for 3 minutes
# ✔ Measures sustained throughput (target: ~4,200 req/s)
# ✔ Captures p95/p99 latency (target: <35ms)
# ✔ Monitors rate-limiter rejection rate (~12% under spike)

3. Spike Test (sudden load jump — 3+ minutes)

make k6-spike
# 10 users → sudden jump to 500 in 5 seconds
# ✔ Stress tests rapid scale-up
# ✔ Verifies no crashes or goroutine leaks
# ✔ Measures latency under immediate load

4. Sustained Load Test (5 minutes)

make k6-sustained
# Constant 200 concurrent users for 5 minutes
# ✔ Real-world scenario (not ramped, constant)
# ✔ Detects slow memory leaks
# ✔ Validates graceful shutdown after sustained load

Expected Results

On typical hardware (4-core i5, 8 GB RAM):

Metric	Smoke Test	Load Test	Spike Test	Sustained (200 CU)
Throughput (req/s)	500–800	4,000–4,500	3,500–4,200	2,500–3,000
p50 latency	2–5 ms	10–15 ms	8–12 ms	6–10 ms
p95 latency	8–15 ms	25–35 ms	20–30 ms	15–25 ms
p99 latency	15–25 ms	35–50 ms	30–45 ms	25–40 ms
Error rate	0%	~1%	~2–5%	<1%
Rate-limited (429)	0%	~10–12%	~15–20%	~8–10%

Understanding the Metrics

Throughput: Requests per second that completed (200 OK or 429)
p95 latency: 95% of requests completed within this time
p99 latency: 99% of requests completed within this time
Rate-limited: Percentage of requests that hit the 429 rate-limit response
Errors: Requests that returned 5xx or connection failures

Custom k6 Test

Edit k6-load-test.js to adjust:

// Change proxy URL
const PROXY_URL = __ENV.PROXY_URL || "http://localhost:8080";

// Modify scenarios — e.g., more aggressive ramp-up
stages: [
  { duration: "30s", target: 200 }, // ← faster ramp
  { duration: "2m", target: 200 },
  { duration: "1m", target: 0 },
];

Or run with environment variables:

PROXY_URL=http://192.168.1.100:8080 k6 run k6-load-test.js --scenario load

k6 Output Files

After each test run, results are saved to k6-results-{scenario}.json:

ls -lh k6-results-*.json
# -rw-r--r--  k6-results-load.json (3.2 MB)

View the JSON summary:

cat k6-results-load.json | jq '.metrics | keys' | head -20

Or use k6's output options:

# Generate HTML report (requires k6 extension)
k6 run k6-load-test.js --scenario load --out json=report.json

# Push results to Prometheus or InfluxDB (advanced)
k6 run k6-load-test.js --out prometheus=http://prometheus:9090

Production Notes

Redis port 16379 is exposed to the host for local debugging only — remove the ports: mapping before any real deployment
Grafana sign-up is disabled (GF_USERS_ALLOW_SIGN_UP=false) and analytics reporting is off
Prometheus retains 15 days of data in the prometheus_data named volume
All containers run as non-root (UID/GID 1001) and have CPU + memory limits set
Sentinel kill -0 1 health check — more reliable than pgrep on BusyBox/Alpine, where the full path (/app/sentinel) does not match an exact-name pgrep -x search

Debugging Log

Real issues hit during development, recorded here as reference:

Symptom	Root Cause	Fix
Metrics showed 0 traffic	`pool` variable shadowed inside handler — handler read an empty instance	Removed `:=` re-declaration; handler closes over the outer `pool` pointer
`bind: forbidden` on port `6379`	Hyper-V reserved the port range on Windows	Mapped Redis to high port `16379` on the host
Proxy crashed on empty registry	Modulo by zero when `len(backends) == 0`	Defensive `if l == 0 { return nil }` before the `% l` operation
`gopher-sentinel` always unhealthy	`pgrep -x sentinel` on BusyBox matches full path, not basename	Replaced with `kill -0 1`
Grafana panels showed "No data"	Datasource UID auto-generated by Grafana didn't match `"uid": "prometheus"` in dashboard JSON	Added `uid: prometheus` to `datasource.yml`

License

MIT — see LICENSE

Author: Yogesh T · GitHub @YogeshT22

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
grafana/provisioning		grafana/provisioning
mock_backends		mock_backends
sentinel		sentinel
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.sentinel		Dockerfile.sentinel
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
k6-load-test.js		k6-load-test.js
main.go		main.go
main_test.go		main_test.go
prometheus.yml		prometheus.yml

Folders and files

Latest commit

History

Repository files navigation

GopherProxy & Sentinel

Overview

Architecture

Components

Features

Proxy (Data Plane)

Sentinel (Control Plane)

Operations

Quick Start

Prerequisites

1 — Configure environment

2 — Build & start the stack

3 — Start mock backends (separate terminal)

4 — Send traffic

Endpoints

Rate Limiting

Prometheus Metrics

Useful PromQL queries

Grafana Dashboard

Environment Variables

GopherProxy

Sentinel

Development

Run tests

Lint

Other Makefile targets

Hot-reload Prometheus config

Project Structure

Dependencies

Load Testing with k6

Prerequisites

Running Load Tests

1. Smoke Test (quick check — 30s)

2. Full Load Test (5–6 minutes)

3. Spike Test (sudden load jump — 3+ minutes)

4. Sustained Load Test (5 minutes)

Expected Results

Understanding the Metrics

Custom k6 Test

k6 Output Files

Production Notes

Debugging Log

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages