| title | data warehouse Maintenance and Cleanup Guide | ||
|---|---|---|---|
| description | This guide covers maintenance operations for the OSM-Notes-Analytics data warehouse, including when | ||
| version | 1.0.0 | ||
| last_updated | 2026-01-25 | ||
| author | AngocA | ||
| tags |
|
||
| audience |
|
||
| project | OSM-Notes-Analytics | ||
| status | active |
This guide covers maintenance operations for the OSM-Notes-Analytics data warehouse, including when and how to use the cleanup script safely.
The cleanup script removes data warehouse objects and temporary files. It's designed for:
- Development environment resets
- Troubleshooting corrupted objects
- Regular maintenance of temporary files
- Complete environment cleanup
The script includes several safety mechanisms:
- Confirmation prompts for destructive operations
- Dry-run mode to preview operations
- Granular options to control what gets removed
- Clear warnings about data loss
# Clean temporary files (safe, no confirmation)
./bin/dwh/cleanupDWH.sh --remove-temp-filesWhen to use:
- After ETL runs to free disk space
- Before running tests to ensure clean environment
- Regular maintenance (weekly/monthly)
- When
/tmpdirectory is getting full
# See what would be removed (safe)
./bin/dwh/cleanupDWH.sh --dry-runWhen to use:
- Before any destructive operation
- Understanding what cleanup will do
- Planning maintenance windows
- Troubleshooting cleanup issues
# Remove everything (requires confirmation)
./bin/dwh/cleanupDWH.shWhen to use:
- Starting fresh development environment
- After major schema changes
- Resolving complex corruption issues
- Before initial ETL setup
# Remove only database objects (requires confirmation)
./bin/dwh/cleanupDWH.sh --remove-all-dataWhen to use:
- Schema corruption issues
- Before schema migrations
- Testing schema changes
- Resolving constraint violations
# 1. Preview what will be removed
./bin/dwh/cleanupDWH.sh --dry-run
# 2. Remove everything (with confirmation)
./bin/dwh/cleanupDWH.sh
# 3. Recreate data warehouse (auto-detects first execution)
./bin/dwh/ETL.sh# Clean temporary files only
./bin/dwh/cleanupDWH.sh --remove-temp-files# 1. Preview DWH cleanup
./bin/dwh/cleanupDWH.sh --dry-run
# 2. Remove only DWH objects
./bin/dwh/cleanupDWH.sh --remove-all-data
# 3. Recreate schema (auto-detects first execution)
./bin/dwh/ETL.sh# Clean between test runs
./bin/dwh/cleanupDWH.sh --remove-temp-files
# Or complete reset for integration tests
./bin/dwh/cleanupDWH.sh --dry-run # Preview first
./bin/dwh/cleanupDWH.sh # Full cleanupSchemas:
staging- Staging area objectsdwh- data warehouse schema
Tables:
dwh.facts- Main fact table (partitioned)dwh.dimension_*- All dimension tablesdwh.datamartCountries- Country analyticsdwh.datamartUsers- User analyticsdwh.iso_country_codes- ISO codes reference
Functions:
dwh.get_*- Helper functionsdwh.update_*- Update functionsdwh.refresh_*- Refresh functions
Triggers:
update_days_to_resolution- Fact table trigger
Directories removed:
/tmp/ETL_*- ETL temporary files/tmp/datamartCountries_*- Country datamart temp files/tmp/datamartUsers_*- User datamart temp files/tmp/profile_*- Profile analysis temp files/tmp/cleanupDWH_*- Cleanup script temp files
The script uses database configuration from etc/properties.sh:
# Database configuration (recommended: use DBNAME_INGESTION and DBNAME_DWH)
# Option 1: Separate databases
DBNAME_INGESTION="notes_dwh"
DBNAME_DWH="notes_dwh"
# Option 2: Same database (legacy/compatibility)
DBNAME="notes_dwh" # Used when both databases are the same
# Database user
DB_USER="notes"- Database must exist and be accessible
- User must have DROP privileges on target schemas
- PostgreSQL client tools (
psql) must be installed - Script must be run from project root directory
-
Always run dry-run first:
./bin/dwh/cleanupDWH.sh --dry-run
-
Backup important data if needed
-
Verify database configuration in
etc/properties.sh -
Ensure you have proper privileges
- Use
--remove-temp-filesfor regular maintenance - Use
--dry-runbefore any destructive operation - Keep backups of important data
- Test cleanup procedures in development first
- Document any custom cleanup procedures
If cleanup fails or causes issues:
- Check logs in
/tmp/cleanupDWH_*directories - Verify database connectivity
- Check user privileges
- Review SQL script files for syntax errors
- Contact database administrator if needed
ERROR: Permission denied for schema dwh
Solution: Ensure user has DROP privileges on schemas
ERROR: Database 'notes_dwh' does not exist
Solution: Check etc/properties.sh configuration
ERROR: SQL file validation failed
Solution: Check SQL script syntax and file permissions
# Show detailed help
./bin/dwh/cleanupDWH.sh --help- ETL Enhanced Features - ETL capabilities and configuration
- DWH Star Schema Data Dictionary - Table definitions
- bin/README.md - Script documentation
- Main README - Project overview
- 2025-10-22: Initial documentation
- 2025-10-22: Added safety guidelines and troubleshooting
- 2025-10-22: Updated with new script options and workflows