| title | Deployment Diagram | ||
|---|---|---|---|
| description | This document provides deployment architecture and operational diagrams for the OSM-Notes-Analytics | ||
| version | 1.0.0 | ||
| last_updated | 2026-01-25 | ||
| author | AngocA | ||
| tags |
|
||
| audience |
|
||
| project | OSM-Notes-Analytics | ||
| status | active |
This document provides deployment architecture and operational diagrams for the OSM-Notes-Analytics system, including infrastructure, scheduling, dependencies, and operational workflows.
- Overview
- Deployment Architecture
- Operational Schedule
- Process Dependencies
- Infrastructure Requirements
- Deployment Steps
- Operational Workflows
- Monitoring and Alerting
- Backup and Recovery
- Security Considerations
- Performance Tuning
- Disaster Recovery
- Related Documentation
- References
The deployment diagram shows:
- Infrastructure: Servers, databases, and storage
- Scheduling: Automated processes and cron jobs
- Dependencies: Operational dependencies between processes
- Workflows: How processes interact in production
graph TB
subgraph Server["Application Server"]
subgraph File_System["File System"]
REPO[Repository<br/>OSM-Notes-Analytics/]
SCRIPTS[Scripts<br/>bin/, sql/]
CONFIG[Configuration<br/>etc/properties.sh<br/>etc/etl.properties]
OUTPUT[JSON Output<br/>output/json/]
LOGS[Log Files<br/>/tmp/ETL_*/]
end
subgraph Processes["Running Processes"]
CRON_SERVICE[Cron Service<br/>Scheduled Jobs]
ETL_PROC[ETL Process<br/>Running/Idle]
DATAMART_PROC[Datamart Process<br/>Running/Idle]
EXPORT_PROC[Export Process<br/>Running/Idle]
end
subgraph Scheduler["Scheduler (Cron)"]
CRON_ETL[ETL Job<br/>Every 15 min]
CRON_DATAMART[Datamart Job<br/>Daily 2 AM]
CRON_EXPORT[Export Job<br/>Every 15 min]
CRON_CLEANUP[Cleanup Job<br/>Weekly]
end
end
subgraph Database_Server["Database Server"]
POSTGRES[PostgreSQL<br/>+ PostGIS]
BASE_SCHEMA[public Schema<br/>Base Tables]
DWH_SCHEMA[dwh Schema<br/>data warehouse]
end
subgraph External["External Systems (OSM Notes Ecosystem)"]
INGESTION[OSM-Notes-Ingestion<br/>Updates Base Tables<br/>Base Project]
DATA[OSM-Notes-Data<br/>JSON Files<br/>GitHub Pages]
VIEWER[OSM-Notes-Viewer<br/>Reads JSON Files<br/>from Data]
API[OSM-Notes-API<br/>Reads from DWH<br/>Optional Consumer]
WMS[OSM-Notes-WMS<br/>Uses Ingestion DB]
MONITORING[OSM-Notes-Monitoring<br/>Monitors All]
end
CRON_SERVICE -->|Triggers| CRON_ETL
CRON_SERVICE -->|Triggers| CRON_DATAMART
CRON_SERVICE -->|Triggers| CRON_EXPORT
CRON_SERVICE -->|Triggers| CRON_CLEANUP
CRON_ETL -->|Executes| ETL_PROC
CRON_DATAMART -->|Executes| DATAMART_PROC
CRON_EXPORT -->|Executes| EXPORT_PROC
ETL_PROC -->|Reads| SCRIPTS
ETL_PROC -->|Reads| CONFIG
ETL_PROC -->|Writes| POSTGRES
ETL_PROC -->|Writes| LOGS
DATAMART_PROC -->|Reads| POSTGRES
DATAMART_PROC -->|Writes| POSTGRES
EXPORT_PROC -->|Reads| POSTGRES
EXPORT_PROC -->|Writes| OUTPUT
INGESTION -->|Updates| BASE_SCHEMA
EXPORT_PROC -->|Publishes| DATA
DATA -->|Serves JSON| VIEWER
POSTGRES -->|Reads| API
INGESTION -->|Same DB| WMS
MONITORING -.->|Monitors| ANALYTICS_SYSTEM
MONITORING -.->|Monitors| INGESTION
MONITORING -.->|Monitors| API
MONITORING -.->|Monitors| DATA
MONITORING -.->|Monitors| VIEWER
MONITORING -.->|Monitors| WMS
POSTGRES -->|Stores| BASE_SCHEMA
POSTGRES -->|Stores| DWH_SCHEMA
graph TB
subgraph ETL_Server["ETL Server"]
ETL_SCRIPTS[ETL Scripts]
ETL_CRON[Cron Jobs]
ETL_LOGS[ETL Logs]
end
subgraph DB_Server["Database Server"]
POSTGRES[PostgreSQL<br/>+ PostGIS]
REPLICA[Read Replica<br/>Optional]
end
subgraph Export_Server["Export Server"]
EXPORT_SCRIPTS[Export Scripts]
EXPORT_CRON[Cron Jobs]
JSON_STORAGE[JSON Storage<br/>NFS/Shared]
end
subgraph Web_Server["Web Server"]
VIEWER_APP[OSM-Notes-Viewer<br/>Web Application]
end
ETL_CRON -->|Runs ETL| ETL_SCRIPTS
ETL_SCRIPTS -->|Writes| POSTGRES
EXPORT_CRON -->|Runs Export| EXPORT_SCRIPTS
EXPORT_SCRIPTS -->|Reads| REPLICA
EXPORT_SCRIPTS -->|Writes| JSON_STORAGE
VIEWER_APP -->|Reads| JSON_STORAGE
VIEWER_APP -->|Queries| REPLICA
gantt
title Production ETL Schedule
dateFormat HH:mm
axisFormat %H:%M
section Hourly
ETL Incremental :active, etl1, 00:00, 00:15
ETL Incremental :etl2, 00:15, 00:15
ETL Incremental :etl3, 00:30, 00:15
ETL Incremental :etl4, 00:45, 00:15
section Daily
Datamart Users :crit, dm1, 02:00, 00:20
Datamart Countries :crit, dm2, 02:20, 00:05
Datamart Global :dm3, 02:25, 00:05
section Every 15 min
JSON Export :active, exp1, 00:00, 00:10
JSON Export :exp2, 00:15, 00:10
JSON Export :exp3, 00:30, 00:10
JSON Export :exp4, 00:45, 00:10
section Weekly
Log Cleanup :cleanup, 03:30, 00:30
VACUUM ANALYZE :vacuum, 03:00, 01:00
Standard Production Schedule:
# ETL + export (recommended): same layout as sibling projects (osm-notes-ingestion, osm-notes-monitoring).
# One-time: create /var/log/osm-notes-analytics/ and install etc/logrotate.osm-analytics.conf (see etc/cron.example).
*/15 * * * * cd /home/notes/OSM-Notes-Analytics && ./bin/dwh/ETL.sh && ./bin/dwh/exportAndPushJSONToGitHub.sh >> /var/log/osm-notes-analytics/analytics.log 2>&1
# Or use /tmp if you prefer no /var/log setup:
# */15 * * * * export CLEAN=false ; export LOG_LEVEL=INFO ; export DBNAME=notes ; export DB_USER=notes ; /home/notes/OSM-Notes-Analytics/bin/dwh/ETL.sh >> /tmp/osm-notes-etl.log 2>&1
# Datamarts: ETL.sh updates them; optional separate jobs:
# 0 2 * * * /home/notes/OSM-Notes-Analytics/bin/dwh/datamartUsers/datamartUsers.sh >> /var/log/osm-notes-analytics/analytics.log 2>&1
# 30 2 * * * /home/notes/OSM-Notes-Analytics/bin/dwh/datamartCountries/datamartCountries.sh >> /var/log/osm-notes-analytics/analytics.log 2>&1
# Maintenance: Weekly cleanup and optimization
0 3 * * 0 psql -U notes -d notes_dwh -c "VACUUM ANALYZE dwh.facts;" >> /var/log/osm-notes-analytics/analytics.log 2>&1
30 3 * * 0 find /tmp/ETL_* -type d -mtime +7 -exec rm -rf {} \; 2>/dev/null || trueSchedule Breakdown:
| Process | Frequency | Time | Duration | Purpose |
|---|---|---|---|---|
| ETL Incremental | Every 15 min | All day | 5-15 min | Process new notes |
| User Datamart | Daily | 02:00 | ~20 min | Update user analytics |
| Country Datamart | Daily | 02:30 | ~5 min | Update country analytics |
| Global Datamart | Daily | 02:35 | ~2 min | Update global stats |
| JSON Export | Every 15 min | All day | ~10 min | Export for viewer |
| VACUUM ANALYZE | Weekly | Sunday 03:00 | ~1 hour | Database optimization |
| Log Cleanup | Weekly | Sunday 03:30 | ~5 min | Remove old logs |
graph TD
INGESTION[OSM-Notes-Ingestion<br/>Updates Base Tables<br/>Base Project]
ETL[ETL Process<br/>Auto-detect]
DATAMART_USERS[User Datamart]
DATAMART_COUNTRIES[Country Datamart]
DATAMART_GLOBAL[Global Datamart]
EXPORT[JSON Export]
DATA[OSM-Notes-Data<br/>JSON Files<br/>GitHub Pages]
VIEWER[OSM-Notes-Viewer<br/>Consumes JSON<br/>from Data]
API[OSM-Notes-API<br/>Reads from DWH<br/>Optional]
WMS[OSM-Notes-WMS<br/>Uses Ingestion DB]
MONITORING[OSM-Notes-Monitoring<br/>Monitors All]
INGESTION -->|Must Complete| ETL
INGESTION -->|Same Database| WMS
ETL -->|Creates/Updates DWH| DATAMART_USERS
ETL -->|Creates/Updates DWH| DATAMART_COUNTRIES
ETL -->|Creates/Updates DWH| DATAMART_GLOBAL
DATAMART_USERS -->|Required| EXPORT
DATAMART_COUNTRIES -->|Required| EXPORT
DATAMART_GLOBAL -->|Optional| EXPORT
EXPORT -->|Publishes| DATA
DATA -->|Provides| VIEWER
ETL -->|DWH| API
MONITORING -.->|Monitors| INGESTION
MONITORING -.->|Monitors| ETL
MONITORING -.->|Monitors| DATA
MONITORING -.->|Monitors| VIEWER
MONITORING -.->|Monitors| API
MONITORING -.->|Monitors| WMS
Initial Deployment:
- Deploy OSM-Notes-Ingestion (populate base tables) - Base project
- Wait for base tables to have data
- Deploy OSM-Notes-WMS (optional, uses same database as Ingestion)
- Run ETL (auto-detects first execution, creates DWH)
- Run datamart scripts (initial population)
- Run JSON export (creates OSM-Notes-Data)
- Deploy OSM-Notes-Viewer (consumes Data)
- Deploy OSM-Notes-API (optional, reads from Analytics DWH)
- Deploy OSM-Notes-Monitoring (monitors all components)
Regular Operations:
- OSM-Notes-Ingestion runs (updates base tables)
- ETL runs (every 15 minutes, auto-detects incremental)
- Datamarts update (daily at 2 AM)
- JSON export runs (every 15 minutes, updates OSM-Notes-Data)
- OSM-Notes-Viewer reads JSON files from Data
- OSM-Notes-API serves queries from Analytics DWH
- OSM-Notes-Monitoring monitors all components
| Component | Specification | Purpose |
|---|---|---|
| CPU | 4 cores | Parallel ETL processing |
| RAM | 8 GB | Database operations |
| Disk | 100 GB | Database + logs |
| Network | 100 Mbps | Database connectivity |
| Component | Specification | Purpose |
|---|---|---|
| CPU | 8+ cores | Faster parallel processing |
| RAM | 16+ GB | Larger datasets |
| Disk | 500+ GB SSD | Fast I/O for database |
| Network | 1 Gbps | Database connectivity |
PostgreSQL Configuration:
- Version: 12 or higher
- PostGIS: 3.0 or higher
- Shared Memory: 4 GB minimum
- Work Memory: 256 MB per connection
- Maintenance Work Memory: 1 GB
- Max Connections: 100
Storage:
- Base Tables: ~50 GB (depends on ingestion)
- data warehouse: ~100 GB (facts + dimensions)
- Datamarts: ~5 GB
- Indexes: ~20 GB
- Total Estimated: ~175 GB
# Install PostgreSQL and PostGIS
sudo apt-get install postgresql-12 postgresql-12-postgis-3
# Install Bash 4.0+
bash --version # Should be 4.0 or higher
# Install development tools (optional)
sudo apt-get install shellcheck shfmt bats# Clone with submodules
git clone --recurse-submodules https://github.com/OSM-Notes/OSM-Notes-Analytics.git
cd OSM-Notes-Analytics
# Or if already cloned, initialize submodules
git submodule update --init --recursive# Create database (if not exists)
createdb notes_dwh
# Enable PostGIS
psql -d notes_dwh -c "CREATE EXTENSION postgis;"
psql -d notes_dwh -c "CREATE EXTENSION btree_gist;"
# Configure connection
cp etc/properties.sh.example etc/properties.sh
nano etc/properties.sh # Edit with your credentials# Check base tables exist (from Ingestion)
psql -d notes_dwh -c "SELECT COUNT(*) FROM notes;"
psql -d notes_dwh -c "SELECT COUNT(*) FROM note_comments;"
# If empty, run OSM-Notes-Ingestion first# Run initial ETL (takes ~30 hours)
cd bin/dwh
./ETL.sh
# Monitor progress
tail -f /tmp/ETL_*/ETL.log# Copy cron template
cp etc/cron.example /tmp/osm-notes-cron
# Edit with your paths (and optional /var/log/osm-notes-analytics setup; see comments in cron.example)
nano /tmp/osm-notes-cron
# Optional: create log directory and install logrotate (same layout as osm-notes-ingestion, osm-notes-monitoring)
# sudo mkdir -p /var/log/osm-notes-analytics && sudo chown notes:maptimebogota /var/log/osm-notes-analytics
# sudo touch /var/log/osm-notes-analytics/analytics.log && sudo chown notes:maptimebogota /var/log/osm-notes-analytics/analytics.log
# sudo cp etc/logrotate.osm-analytics.conf /etc/logrotate.d/osm-analytics
# Install cron
crontab /tmp/osm-notes-cron
# Verify
crontab -lSee Cron Setup Guide for detailed cron configuration and log setup.
# Check ETL ran successfully
psql -d notes_dwh -c "SELECT COUNT(*) FROM dwh.facts;"
# Check datamarts populated
psql -d notes_dwh -c "SELECT COUNT(*) FROM dwh.datamartusers;"
psql -d notes_dwh -c "SELECT COUNT(*) FROM dwh.datamartcountries;"
# Check JSON files created
ls -lh output/json/users/ | head -10
ls -lh output/json/countries/ | head -10sequenceDiagram
participant Cron as Cron Scheduler
participant ETL as ETL.sh
participant DB as PostgreSQL
participant Logs as Log Files
Cron->>ETL: Trigger (every 15 min)
ETL->>ETL: Check lock file
alt Lock exists and fresh
ETL->>Logs: Log "Skipping, ETL already running"
ETL->>Cron: Exit (skip)
else No lock or stale
ETL->>ETL: Create lock file
ETL->>DB: Check base tables
ETL->>DB: Process new data
ETL->>DB: Update dimensions
ETL->>DB: Load new facts
ETL->>DB: Update datamarts
ETL->>Logs: Log completion
ETL->>ETL: Remove lock file
ETL->>Cron: Exit (success)
end
sequenceDiagram
participant Cron as Cron Scheduler
participant DM_Users as datamartUsers.sh
participant DM_Countries as datamartCountries.sh
participant DB as PostgreSQL
Cron->>DM_Users: Trigger (02:00)
DM_Users->>DB: Check facts table
DM_Users->>DB: Process 500 users
DM_Users->>DB: Update datamartusers
DM_Users->>Cron: Complete
Cron->>DM_Countries: Trigger (02:30)
DM_Countries->>DB: Check facts table
DM_Countries->>DB: Process all countries
DM_Countries->>DB: Update datamartcountries
DM_Countries->>Cron: Complete
sequenceDiagram
participant Cron as Cron Scheduler
participant Export as exportDatamartsToJSON.sh
participant DB as PostgreSQL
participant Validator as Schema Validator
participant FS as File System
Cron->>Export: Trigger (every 15 min)
Export->>DB: Query datamartusers
Export->>DB: Query datamartcountries
DB-->>Export: Return data
Export->>Export: Convert to JSON
Export->>Validator: Validate JSON
alt Valid
Export->>FS: Write to temp directory
Export->>FS: Move atomically to output/
Export->>Cron: Success
else Invalid
Export->>FS: Keep existing files
Export->>Cron: Error (exit code != 0)
end
Automated Health Check Script:
#!/bin/bash
# bin/dwh/health_check.sh
# Check database connectivity
psql -d notes_dwh -c "SELECT 1;" || echo "ERROR: Database connection failed"
# Check ETL ran recently (within last hour)
LAST_ETL=$(find /tmp/ETL_* -name "ETL.log" -mmin -60 | head -1)
if [ -z "$LAST_ETL" ]; then
echo "WARNING: ETL hasn't run in last hour"
fi
# Check datamarts have data
USER_COUNT=$(psql -d notes_dwh -t -c "SELECT COUNT(*) FROM dwh.datamartusers;")
if [ "$USER_COUNT" -eq 0 ]; then
echo "ERROR: User datamart is empty"
fi
# Check disk space
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
echo "WARNING: Disk usage at ${DISK_USAGE}%"
fiKey Metrics to Monitor:
-
ETL Execution
- Last execution time
- Execution duration
- Success/failure rate
- Rows processed
-
Database Health
- Connection status
- Table sizes
- Index usage
- Query performance
-
Datamart Status
- Last update time
- Row counts
- Data freshness
-
Export Status
- Last export time
- File counts
- Validation status
- File sizes
-
System Resources
- CPU usage
- Memory usage
- Disk space
- Disk I/O
Database Backups:
# Daily full backup of DWH schema
0 1 * * * pg_dump -U postgres -d notes_dwh -n dwh > /backups/dwh_$(date +\%Y\%m\%d).sql
# Weekly full database backup
0 2 * * 0 pg_dump -U postgres -d notes_dwh > /backups/full_$(date +\%Y\%m\%d).sql
# Keep last 30 days
0 3 * * * find /backups/*.sql -mtime +30 -deleteJSON Export Backups:
# Daily backup of JSON exports
0 4 * * * tar -czf /backups/json_$(date +\%Y\%m\%d).tar.gz output/json/
# Keep last 7 days
0 5 * * * find /backups/json_*.tar.gz -mtime +7 -deleteScenario 1: ETL Failure
# Check logs
tail -100 /tmp/ETL_*/ETL.log
# Check for lock file
ls -lh /tmp/ETL_*/ETL.lock
# Remove stale lock if needed
find /tmp/ETL_* -name "ETL.lock" -mmin +240 -delete
# Resume ETL
./bin/dwh/ETL.shScenario 2: Database Corruption
# Restore from backup
psql -U notes -d notes_dwh < /backups/dwh_20250115.sql
# Re-run ETL if needed
./bin/dwh/ETL.shScenario 3: JSON Export Failure
# Check validation errors
grep -i "validation" /var/log/osm-notes-analytics/analytics.log
# Re-export manually
./bin/dwh/exportDatamartsToJSON.shDatabase Users:
-- ETL User (read public, write dwh)
CREATE USER etl_user WITH PASSWORD 'secure_password';
GRANT USAGE ON SCHEMA public TO etl_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO etl_user;
GRANT ALL ON SCHEMA dwh TO etl_user;
GRANT ALL ON ALL TABLES IN SCHEMA dwh TO etl_user;
-- Read-only user (for analysts)
CREATE USER analyst_user WITH PASSWORD 'secure_password';
GRANT USAGE ON SCHEMA dwh TO analyst_user;
GRANT SELECT ON ALL TABLES IN SCHEMA dwh TO analyst_user;File System Permissions:
# Scripts: executable by owner, readable by group
chmod 750 bin/dwh/*.sh
# Configuration: readable by owner only
chmod 600 etc/properties.sh
chmod 600 etc/etl.properties
# Output: readable by web server, writable by ETL user
chmod 755 output/json/
chmod 644 output/json/**/*.json- Database: Listen only on localhost or private network
- Firewall: Restrict database port (5432) to application servers only
- SSL: Use SSL for database connections in production
- Backups: Encrypt backup files
PostgreSQL Configuration (postgresql.conf):
# Memory settings
shared_buffers = 4GB
work_mem = 256MB
maintenance_work_mem = 1GB
# Query planner
random_page_cost = 1.1 # For SSD
effective_cache_size = 12GB
# Parallel processing
max_parallel_workers_per_gather = 4
max_parallel_workers = 8Configuration (etc/etl.properties):
# Increase batch size for faster processing
ETL_BATCH_SIZE=5000
# Increase parallel jobs for multi-core systems
ETL_MAX_PARALLEL_JOBS=8
# Adjust memory threshold
MAX_MEMORY_USAGE=85| Component | RTO | Recovery Method |
|---|---|---|
| ETL Process | 1 hour | Re-run ETL from last checkpoint |
| Datamarts | 2 hours | Re-run datamart scripts |
| JSON Export | 15 minutes | Re-run export script |
| Database | 4 hours | Restore from backup + re-run ETL |
| Component | RPO | Backup Frequency |
|---|---|---|
| data warehouse | 24 hours | Daily backups |
| JSON Exports | 24 hours | Daily backups |
| Configuration | 7 days | Weekly backups |
- Cron Setup Guide: Detailed cron configuration
- Architecture Diagram: System architecture
- Troubleshooting Guide: Operational issues
- DWH Maintenance Guide: Database maintenance