|
| 1 | +# redis-proxy Deployment Guide |
| 2 | + |
| 3 | +redis-proxy is a Redis-protocol reverse proxy that enables gradual migration from Redis to ElasticKV through dual-write, shadow-read comparison, and phased primary cutover. |
| 4 | + |
| 5 | +## Docker Image |
| 6 | + |
| 7 | +Pre-built images are published to GitHub Container Registry when relevant files change on `main` (see path filters in the workflow): |
| 8 | + |
| 9 | +``` |
| 10 | +ghcr.io/bootjp/elastickv/redis-proxy:latest |
| 11 | +ghcr.io/bootjp/elastickv/redis-proxy:sha-<commit> |
| 12 | +``` |
| 13 | + |
| 14 | +The CI workflow (`.github/workflows/redis-proxy-docker.yml`) builds the image automatically when files under `cmd/redis-proxy/`, `proxy/`, `Dockerfile.redis-proxy`, `go.mod`, `go.sum`, or the workflow file itself change. |
| 15 | + |
| 16 | +### Building locally |
| 17 | + |
| 18 | +```bash |
| 19 | +# Docker |
| 20 | +docker build -f Dockerfile.redis-proxy -t redis-proxy . |
| 21 | + |
| 22 | +# Binary |
| 23 | +go build -o redis-proxy ./cmd/redis-proxy/ |
| 24 | +``` |
| 25 | + |
| 26 | +## Command-Line Options |
| 27 | + |
| 28 | +| Flag | Default | Description | |
| 29 | +|------|---------|-------------| |
| 30 | +| `-listen` | `:6479` | Proxy listen address | |
| 31 | +| `-primary` | `localhost:6379` | Primary (Redis) address | |
| 32 | +| `-primary-db` | `0` | Primary Redis DB number | |
| 33 | +| `-primary-password` | (empty) | Primary Redis password | |
| 34 | +| `-secondary` | `localhost:6380` | Secondary (ElasticKV) address | |
| 35 | +| `-secondary-db` | `0` | Secondary Redis DB number | |
| 36 | +| `-secondary-password` | (empty) | Secondary Redis password | |
| 37 | +| `-mode` | `dual-write` | Proxy mode (see below) | |
| 38 | +| `-secondary-timeout` | `5s` | Secondary write timeout | |
| 39 | +| `-shadow-timeout` | `3s` | Shadow read timeout | |
| 40 | +| `-sentry-dsn` | (empty) | Sentry DSN (empty = disabled) | |
| 41 | +| `-sentry-env` | (empty) | Sentry environment name | |
| 42 | +| `-sentry-sample` | `1.0` | Sentry sample rate | |
| 43 | +| `-metrics` | `:9191` | Prometheus metrics endpoint | |
| 44 | + |
| 45 | +## Proxy Modes |
| 46 | + |
| 47 | +Five modes support a phased migration strategy. |
| 48 | + |
| 49 | +| Mode | Reads from | Writes to | Use case | |
| 50 | +|------|-----------|-----------|----------| |
| 51 | +| `redis-only` | Redis | Redis only | Transparent proxy. Route traffic through the proxy first | |
| 52 | +| `dual-write` | Redis | Redis + ElasticKV | Begin data sync. Populate ElasticKV | |
| 53 | +| `dual-write-shadow` | Redis (+ shadow compare from ElasticKV) | Redis + ElasticKV | Verify read consistency between backends | |
| 54 | +| `elastickv-primary` | ElasticKV (+ shadow compare from Redis) | ElasticKV + Redis | Promote ElasticKV to primary. Redis as fallback | |
| 55 | +| `elastickv-only` | ElasticKV | ElasticKV only | Migration complete. Decommission Redis | |
| 56 | + |
| 57 | +### Recommended Migration Path |
| 58 | + |
| 59 | +``` |
| 60 | +redis-only -> dual-write -> dual-write-shadow -> elastickv-primary -> elastickv-only |
| 61 | +``` |
| 62 | + |
| 63 | +Monitor metrics at each stage and roll back to the previous mode if issues arise. Mode changes require a proxy restart. |
| 64 | + |
| 65 | +## Deployment Examples |
| 66 | + |
| 67 | +### Minimal (redis-only) |
| 68 | + |
| 69 | +```bash |
| 70 | +docker run --rm \ |
| 71 | + -p 6379:6379 \ |
| 72 | + ghcr.io/bootjp/elastickv/redis-proxy:latest \ |
| 73 | + -listen :6379 \ |
| 74 | + -primary redis.internal:6379 \ |
| 75 | + -mode redis-only |
| 76 | +``` |
| 77 | + |
| 78 | +Point your application at the proxy. Behavior is identical to connecting directly to Redis. |
| 79 | + |
| 80 | +### Dual-Write with Shadow Comparison |
| 81 | + |
| 82 | +```bash |
| 83 | +docker run --rm \ |
| 84 | + -p 6379:6479 \ |
| 85 | + -p 9191:9191 \ |
| 86 | + ghcr.io/bootjp/elastickv/redis-proxy:latest \ |
| 87 | + -listen :6479 \ |
| 88 | + -primary redis.internal:6379 \ |
| 89 | + -primary-password "${REDIS_PASSWORD}" \ |
| 90 | + -secondary elastickv.internal:6380 \ |
| 91 | + -mode dual-write-shadow \ |
| 92 | + -secondary-timeout 5s \ |
| 93 | + -shadow-timeout 3s \ |
| 94 | + -sentry-dsn "${SENTRY_DSN}" \ |
| 95 | + -sentry-env production \ |
| 96 | + -metrics :9191 |
| 97 | +``` |
| 98 | + |
| 99 | +### Docker Compose |
| 100 | + |
| 101 | +```yaml |
| 102 | +services: |
| 103 | + redis-proxy: |
| 104 | + image: ghcr.io/bootjp/elastickv/redis-proxy:latest |
| 105 | + ports: |
| 106 | + - "6379:6479" |
| 107 | + - "9191:9191" |
| 108 | + command: |
| 109 | + - -listen=:6479 |
| 110 | + - -primary=redis:6379 |
| 111 | + - -secondary=elastickv:6380 |
| 112 | + - -mode=dual-write-shadow |
| 113 | + - -metrics=:9191 |
| 114 | + depends_on: |
| 115 | + - redis |
| 116 | + - elastickv |
| 117 | + |
| 118 | + redis: |
| 119 | + image: redis:7 |
| 120 | + ports: |
| 121 | + - "6379" |
| 122 | + |
| 123 | + elastickv: |
| 124 | + image: ghcr.io/bootjp/elastickv:latest |
| 125 | + ports: |
| 126 | + - "6380" |
| 127 | +``` |
| 128 | +
|
| 129 | +### Kubernetes |
| 130 | +
|
| 131 | +```yaml |
| 132 | +apiVersion: apps/v1 |
| 133 | +kind: Deployment |
| 134 | +metadata: |
| 135 | + name: redis-proxy |
| 136 | +spec: |
| 137 | + replicas: 1 |
| 138 | + selector: |
| 139 | + matchLabels: |
| 140 | + app: redis-proxy |
| 141 | + template: |
| 142 | + metadata: |
| 143 | + labels: |
| 144 | + app: redis-proxy |
| 145 | + annotations: |
| 146 | + prometheus.io/scrape: "true" |
| 147 | + prometheus.io/port: "9191" |
| 148 | + spec: |
| 149 | + containers: |
| 150 | + - name: redis-proxy |
| 151 | + image: ghcr.io/bootjp/elastickv/redis-proxy:latest |
| 152 | + args: |
| 153 | + - -listen=:6479 |
| 154 | + - -primary=redis:6379 |
| 155 | + - -secondary=elastickv:6380 |
| 156 | + - -mode=dual-write-shadow |
| 157 | + - -metrics=:9191 |
| 158 | + ports: |
| 159 | + - containerPort: 6479 |
| 160 | + name: redis |
| 161 | + - containerPort: 9191 |
| 162 | + name: metrics |
| 163 | + livenessProbe: |
| 164 | + tcpSocket: |
| 165 | + port: 6479 |
| 166 | + initialDelaySeconds: 5 |
| 167 | + periodSeconds: 10 |
| 168 | + readinessProbe: |
| 169 | + tcpSocket: |
| 170 | + port: 6479 |
| 171 | + initialDelaySeconds: 3 |
| 172 | + periodSeconds: 5 |
| 173 | + resources: |
| 174 | + requests: |
| 175 | + cpu: 100m |
| 176 | + memory: 128Mi |
| 177 | + limits: |
| 178 | + cpu: "1" |
| 179 | + memory: 512Mi |
| 180 | +``` |
| 181 | +
|
| 182 | +> **Note:** The distroless base image does not include `redis-cli`. If you want to use the `exec`-based probe below, build a redis-proxy image that includes `redis-cli` (or another ping tool) in the same container. Otherwise, prefer the `tcpSocket` probes shown in the Deployment spec above or an HTTP health endpoint. |
| 183 | + |
| 184 | +```yaml |
| 185 | +# Alternative: exec-based probe (requires redis-cli in the image) |
| 186 | +livenessProbe: |
| 187 | + exec: |
| 188 | + command: |
| 189 | + - /bin/sh |
| 190 | + - -c |
| 191 | + - 'redis-cli -p 6479 PING || exit 1' |
| 192 | + initialDelaySeconds: 5 |
| 193 | + periodSeconds: 10 |
| 194 | +``` |
| 195 | + |
| 196 | +## Health Checks |
| 197 | + |
| 198 | +The proxy does not expose an HTTP health endpoint. Use the Redis `PING` command to verify availability: |
| 199 | + |
| 200 | +```bash |
| 201 | +redis-cli -p 6479 PING |
| 202 | +# PONG |
| 203 | +``` |
| 204 | + |
| 205 | +## Prometheus Metrics |
| 206 | + |
| 207 | +Available at `/metrics` on the address specified by `-metrics`. |
| 208 | + |
| 209 | +### Key Metrics |
| 210 | + |
| 211 | +| Metric | Type | Description | |
| 212 | +|--------|------|-------------| |
| 213 | +| `proxy_command_total` | Counter | Commands processed (labels: command, backend, status) | |
| 214 | +| `proxy_command_duration_seconds` | Histogram | Backend command latency | |
| 215 | +| `proxy_primary_write_errors_total` | Counter | Primary write errors | |
| 216 | +| `proxy_secondary_write_errors_total` | Counter | Secondary write errors | |
| 217 | +| `proxy_primary_read_errors_total` | Counter | Primary read errors | |
| 218 | +| `proxy_shadow_read_errors_total` | Counter | Shadow read errors | |
| 219 | +| `proxy_divergences_total` | Counter | Shadow read mismatches (labels: command, kind) | |
| 220 | +| `proxy_migration_gap_total` | Counter | Expected mismatches from incomplete migration (labels: command) | |
| 221 | +| `proxy_async_drops_total` | Counter | Async operations dropped due to backpressure | |
| 222 | +| `proxy_active_connections` | Gauge | Current active client connections | |
| 223 | +| `proxy_pubsub_shadow_divergences_total` | Counter | Pub/Sub shadow message mismatches (labels: kind) | |
| 224 | +| `proxy_pubsub_shadow_errors_total` | Counter | Pub/Sub shadow operation errors | |
| 225 | + |
| 226 | +### Recommended Alerts |
| 227 | + |
| 228 | +```yaml |
| 229 | +groups: |
| 230 | + - name: redis-proxy |
| 231 | + rules: |
| 232 | + - alert: ProxyDivergenceHigh |
| 233 | + expr: rate(proxy_divergences_total[5m]) > 0 |
| 234 | + for: 10m |
| 235 | + annotations: |
| 236 | + summary: "Data mismatch detected between primary and secondary" |
| 237 | +
|
| 238 | + - alert: ProxySecondaryWriteErrors |
| 239 | + expr: rate(proxy_secondary_write_errors_total[5m]) > 1 |
| 240 | + for: 5m |
| 241 | + annotations: |
| 242 | + summary: "Secondary backend write errors are elevated" |
| 243 | +
|
| 244 | + - alert: ProxyAsyncDrops |
| 245 | + expr: rate(proxy_async_drops_total[5m]) > 0 |
| 246 | + for: 5m |
| 247 | + annotations: |
| 248 | + summary: "Async goroutine limit reached; secondary may be slow" |
| 249 | +``` |
| 250 | + |
| 251 | +## Internal Parameters |
| 252 | + |
| 253 | +| Parameter | Value | Description | |
| 254 | +|-----------|-------|-------------| |
| 255 | +| Connection pool size | 128 | go-redis pool size per backend | |
| 256 | +| Dial timeout | 5s | Backend connection timeout | |
| 257 | +| Read timeout | 3s | Backend read timeout | |
| 258 | +| Write timeout | 3s | Backend write timeout | |
| 259 | +| Async write goroutine limit | 4096 | Max concurrent secondary writes | |
| 260 | +| Shadow read goroutine limit | 1024 | Max concurrent shadow comparisons | |
| 261 | +| PubSub compare window | 2s | Message matching window | |
| 262 | +| PubSub sweep interval | 500ms | Expired message scan interval | |
| 263 | + |
| 264 | +## Graceful Shutdown |
| 265 | + |
| 266 | +The proxy handles `SIGINT` / `SIGTERM` for graceful shutdown: |
| 267 | + |
| 268 | +1. Stops accepting new connections |
| 269 | +2. Waits for in-flight async goroutines to complete |
| 270 | +3. Releases backend connection pools |
| 271 | +4. Flushes Sentry buffers (up to 2 seconds) |
| 272 | + |
| 273 | +Recommended shutdown order: `redis-proxy -> application -> Redis / ElasticKV`. |
| 274 | + |
| 275 | +## Troubleshooting |
| 276 | + |
| 277 | +### Secondary writes are falling behind |
| 278 | +- Check `proxy_async_drops_total`. If increasing, the goroutine limit is being hit. |
| 279 | +- Reduce `-secondary-timeout` to fail fast on slow secondaries. |
| 280 | +- Investigate secondary (ElasticKV) performance. |
| 281 | + |
| 282 | +### High divergence count |
| 283 | +- Also check `proxy_migration_gap_total`. Pre-migration missing keys are counted as gaps, not divergences. |
| 284 | +- In `dual-write-shadow` mode, inspect `proxy_divergences_total` labels to identify which commands are mismatched. |
| 285 | + |
| 286 | +### Pub/Sub messages missing |
| 287 | +- Check `proxy_pubsub_shadow_divergences_total`. |
| 288 | +- `kind=data_mismatch`: message received by primary but not secondary. |
| 289 | +- `kind=extra_data`: message received by secondary only. |
0 commit comments