| title | DWH Star Schema Data Dictionary | |
|---|---|---|
| description | This document provides a tabular data dictionary for the star schema used in the data warehouse. It | |
| version | 1.0.0 | |
| last_updated | 2026-01-25 | |
| author | AngocA | |
| tags |
|
|
| audience |
|
|
| project | OSM-Notes-Analytics | |
| status | active |
This document provides a tabular data dictionary for the star schema used in the data warehouse. It covers the fact table and all dimensions, including data types, nullability, defaults, keys, and functional descriptions.
Conventions:
- Types are PostgreSQL types.
- PK: Primary Key, FK: Foreign Key.
- N: NOT NULL, Y: NULL allowed.
Central fact table with one row per note action (open, comment, reopen, close, hide).
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| fact_id | SERIAL | N | auto | PK | Surrogate key |
| id_note | INTEGER | N | OSM note id | ||
| sequence_action | INTEGER | Y | Creation sequence per action | ||
| dimension_id_country | INTEGER | N | FK | Country dimension key | |
| processing_time | TIMESTAMP | N | CURRENT_TIMESTAMP | Insert timestamp | |
| action_at | TIMESTAMP | N | Action timestamp | ||
| action_comment | note_event_enum | N | Action type: opened/closed/... | ||
| action_dimension_id_date | INTEGER | N | FK | Date dimension key of action | |
| action_dimension_id_hour_of_week | SMALLINT | N | FK | Hour-of-week dimension key of action | |
| action_dimension_id_user | INTEGER | Y | FK | User dimension key of action | |
| opened_dimension_id_date | INTEGER | N | FK | Date dimension key when note was opened | |
| opened_dimension_id_hour_of_week | SMALLINT | N | FK | Hour-of-week key when note was opened | |
| opened_dimension_id_user | INTEGER | Y | FK | User dimension key who opened | |
| closed_dimension_id_date | INTEGER | Y | FK | Date dimension key when closed | |
| closed_dimension_id_hour_of_week | SMALLINT | Y | FK | Hour-of-week key when closed | |
| closed_dimension_id_user | INTEGER | Y | FK | User dimension key who closed | |
| dimension_application_creation | INTEGER | Y | FK | Application dimension key used to open | |
| recent_opened_dimension_id_date | INTEGER | N | FK | Most recent open/reopen date key | |
| days_to_resolution | INTEGER | Y | Days from first open to most recent close | ||
| days_to_resolution_active | INTEGER | Y | Total days in open status across reopens | ||
| days_to_resolution_from_reopen | INTEGER | Y | Days from last reopen to most recent close | ||
| hashtag_number | INTEGER | Y | Total number of hashtags detected | ||
| comment_length | INTEGER | Y | Length of comment text | ||
| has_url | BOOLEAN | Y | FALSE | True if comment contains URL | |
| has_mention | BOOLEAN | Y | FALSE | True if comment mentions another user | |
| total_comments_on_note | INTEGER | Y | Total comments count UP TO this action (trigger) | ||
| total_reopenings_count | INTEGER | Y | Total reopenings count UP TO this action (trigger) | ||
| total_actions_on_note | INTEGER | Y | Total actions count UP TO this action (trigger) |
Bridge table linking facts to hashtags with action classification.
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| fact_id | INTEGER | N | FK | Reference to facts table | |
| dimension_hashtag_id | INTEGER | N | FK | Reference to hashtag dimension | |
| position | SMALLINT | Y | Position of hashtag in comment | ||
| used_in_action | note_event_enum | Y | Action type where hashtag was used | ||
| is_opening_hashtag | BOOLEAN | Y | FALSE | True if used in note opening | |
| is_resolution_hashtag | BOOLEAN | Y | FALSE | True if used in note resolution |
Notes:
- FKs: country →
dwh.dimension_countries.dimension_country_id, date/hour/user → corresponding dimensions, application →dwh.dimension_applications.dimension_application_id. - Hashtags are stored in bridge table
dwh.fact_hashtags(see below). recent_opened_dimension_id_dateis enforced NOT NULL after unify step.- Resolution day metrics are maintained by trigger on insert of closing actions.
comment_length,has_url,has_mentionare calculated during ETL from comment text.total_comments_on_note,total_reopenings_count,total_actions_on_noteare calculated by trigger BEFORE INSERT for historical accuracy (performance impact: 1 SELECT per row inserted).
Most used hashtags in note opening actions with usage statistics.
Most used hashtags in note resolution/closure actions with resolution metrics.
Most used hashtags in comment actions with engagement metrics.
Hashtag usage breakdown by action type (opened/commented/closed/etc).
Top hashtags overall with breakdown by action type and geographic/user distribution.
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_user_id | SERIAL | N | auto | PK | Surrogate key |
| user_id | INTEGER | N | OSM user id | ||
| username | VARCHAR(256) | Y | Most recent username | ||
| modified | BOOLEAN | Y | Flag for datamart updates |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_region_id | SERIAL | N | auto | PK | Surrogate key |
| region_name_es | VARCHAR(60) | Y | Name in Spanish | ||
| region_name_en | VARCHAR(60) | Y | Name in English |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_country_id | SERIAL | N | auto | PK | Surrogate key |
| country_id | INTEGER | N | OSM relation id | ||
| country_name | VARCHAR(100) | Y | Local name | ||
| country_name_es | VARCHAR(100) | Y | Spanish name | ||
| country_name_en | VARCHAR(100) | Y | English name | ||
| region_id | INTEGER | Y | FK | Region key → dimension_regions |
|
| modified | BOOLEAN | Y | Flag for datamart updates |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_day_id | SERIAL | N | auto | PK | Surrogate key |
| date_id | DATE | Y | Full date | ||
| year | SMALLINT | Y | Year component | ||
| month | SMALLINT | Y | Month component | ||
| day | SMALLINT | Y | Day component | ||
| iso_year | SMALLINT | Y | ISO year | ||
| iso_week | SMALLINT | Y | ISO week (1..53) | ||
| day_of_year | SMALLINT | Y | Day of year (1..366) | ||
| quarter | SMALLINT | Y | Quarter (1..4) | ||
| month_name | VARCHAR(16) | Y | Month name (en) | ||
| day_name | VARCHAR(16) | Y | Day name (en, ISO) | ||
| is_weekend | BOOLEAN | Y | Weekend flag (ISO) | ||
| is_month_end | BOOLEAN | Y | Month-end flag | ||
| is_quarter_end | BOOLEAN | Y | Quarter-end flag | ||
| is_year_end | BOOLEAN | Y | Year-end flag |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_tow_id | SMALLINT | Y | PK | Encodes day-of-week and hour | |
| day_of_week | SMALLINT | Y | 1..7 (ISO) | ||
| hour_of_day | SMALLINT | Y | 0..23 | ||
| hour_of_week | SMALLINT | Y | 0..167 | ||
| period_of_day | VARCHAR(16) | Y | Night/Morning/Afternoon/Evening |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_application_id | SERIAL | N | auto | PK | Surrogate key |
| application_name | VARCHAR(64) | N | Application name | ||
| pattern | VARCHAR(64) | Y | Pattern used to detect the app in text | ||
| pattern_type | VARCHAR(16) | Y | SIMILAR/LIKE/REGEXP | ||
| platform | VARCHAR(16) | Y | Optional platform | ||
| vendor | VARCHAR(32) | Y | Vendor/author | ||
| category | VARCHAR(32) | Y | Category/type | ||
| active | BOOLEAN | Y | Active flag |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_hashtag_id | SERIAL | N | auto | PK | Surrogate key |
| description | TEXT | Y | Hashtag text |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_timezone_id | SERIAL | N | auto | PK | Surrogate key |
| tz_name | VARCHAR(64) | N | IANA timezone name or UTC±N band | ||
| utc_offset_minutes | SMALLINT | Y | UTC offset in minutes |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_season_id | SMALLINT | N | PK | Season id | |
| season_name_en | VARCHAR(16) | Y | Season name (en) | ||
| season_name_es | VARCHAR(16) | Y | Season name (es) |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_automation_id | SMALLINT | N | PK | Automation level id | |
| automation_level | VARCHAR(30) | N | Level name: human/probably_human/uncertain/probably_automated/automated/bulk_import | ||
| confidence_score | DECIMAL(3,2) | N | Confidence 0.00-1.00 | ||
| description | TEXT | Y | Human-readable description | ||
| detection_criteria | JSONB | Y | Criteria that triggered classification | ||
| created_at | TIMESTAMP | Y | CURRENT_TIMESTAMP | Creation timestamp |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_experience_id | SMALLINT | N | PK | Experience level id | |
| experience_level | VARCHAR(30) | N | Level name: newcomer/beginner/intermediate/advanced/expert/master/legend | ||
| min_notes_opened | INTEGER | N | Minimum notes opened to qualify | ||
| min_notes_closed | INTEGER | N | Minimum notes closed to qualify | ||
| min_days_active | INTEGER | N | Minimum days active to qualify | ||
| level_order | SMALLINT | N | Sort order (1=newcomer, 7=legend) | ||
| description | TEXT | Y | Human-readable description |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_continent_id | SERIAL | N | auto | PK | Surrogate key |
| continent_name_es | VARCHAR(32) | Y | Continent (es) | ||
| continent_name_en | VARCHAR(32) | Y | Continent (en) |
| Column | Type | Description |
|---|---|---|
| iso_alpha2 | VARCHAR(2) | ISO 3166-1 alpha-2 |
| iso_alpha3 | VARCHAR(3) | ISO 3166-1 alpha-3 |
| Column | Type | Description |
|---|---|---|
| valid_from | TIMESTAMP | Validity start |
| valid_to | TIMESTAMP | Validity end |
| is_current | BOOLEAN | Current row flag |
| Column | Type | Description |
|---|---|---|
| experience_level_id | SMALLINT | FK to experience level |
| total_notes_opened | INTEGER | Total notes opened by user |
| total_notes_closed | INTEGER | Total notes closed by user |
| days_active | INTEGER | Days between first and last activity |
| resolution_ratio | DECIMAL(4,2) | % of opened notes that were closed |
| last_activity_date | DATE | Date of most recent activity |
| experience_calculated_at | TIMESTAMP | When experience level was last calculated |
| Column | Type | Description |
|---|---|---|
| action_timezone_id | INTEGER | FK to timezone |
| local_action_dimension_id_date | INTEGER | Local date id |
| local_action_dimension_id_hour_of_week | SMALLINT | Local time-of-week id |
| action_dimension_id_season | SMALLINT | Season id |
| dimension_application_version | INTEGER | FK to app version |
| comment_length | INTEGER | Length of comment text |
| has_url | BOOLEAN | Comment contains URL |
| has_mention | BOOLEAN | Comment mentions user |
| dimension_id_automation | SMALLINT | FK to automation level |
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| dimension_application_version_id | SERIAL | N | auto | PK | Surrogate key |
| dimension_application_id | INTEGER | N | FK | Application | |
| version | VARCHAR(32) | N | Version string |
| Column | Type | Description |
|---|---|---|
| fact_id | INTEGER | FK to facts |
| dimension_hashtag_id | INTEGER | FK to hashtags |
| position | SMALLINT | Positional order in text |
Used internally by ETL orchestration.
| Column | Type | Null | Default | Key | Description |
|---|---|---|---|---|---|
| key | VARCHAR(16) | Y | Property name | ||
| value | VARCHAR(26) | Y | Property value |
The data warehouse includes pre-computed datamarts that aggregate metrics from the star schema for fast analytics queries. These datamarts are updated incrementally and provide ready-to-use metrics for dashboards and reports.
Pre-computed analytics aggregated by country. Contains 77+ metrics covering:
- Historical counts (whole history, by year, by month)
- Resolution metrics (average/median days to resolution, resolution rates)
- Application statistics (usage patterns, mobile vs desktop)
- Content quality metrics (comment length, URLs, mentions)
- Community health metrics (backlog, health score, activity trends)
- Temporal patterns (hour-of-week, day-of-week distributions)
- Geographic patterns
- Hashtag metrics
Key Columns:
dimension_country_id(PK): Reference to country dimensioncountry_name,country_name_en,country_name_es: Country namesiso_alpha2,iso_alpha3: ISO codes- 77+ metric columns: See Metric Definitions for complete list
Update Procedure: dwh.update_datamart_country(dimension_country_id)
Full Documentation: See Metric Definitions
Pre-computed analytics aggregated by user. Contains 78+ metrics covering:
- Historical counts (whole history, by year, by month)
- Resolution metrics (average/median days to resolution, resolution rates)
- Application statistics (usage patterns, trends)
- Content quality metrics (comment length, URLs, mentions)
- User behavior patterns (response time, collaboration, activity)
- Temporal patterns (hour-of-week, day-of-week distributions)
- Geographic patterns
- Hashtag metrics
- User classification (experience level)
Key Columns:
dimension_user_id(PK): Reference to user dimensionuser_id: OSM user IDusername: Most recent username- 78+ metric columns: See Metric Definitions for complete list
Update Procedure: dwh.update_datamart_user(dimension_user_id)
Full Documentation: See Metric Definitions
Pre-computed global statistics aggregating all countries and users.
Key Metrics:
- Total notes globally
- Active countries/users
- Global resolution rates
- Global activity trends
Update Procedure: dwh.update_datamart_global()
The datamarts organize metrics into the following categories:
- Historical Count Metrics: Activity counts by time period (whole history, by year, by month)
- Resolution Metrics: Time to resolution, resolution rates, resolution trends
- Application Statistics: Tool usage patterns, version adoption, mobile vs desktop
- Content Quality Metrics: Comment analysis, URL usage, mentions
- Temporal Pattern Metrics: Hour-of-week, day-of-week distributions
- Geographic Pattern Metrics: Country-level aggregations
- Community Health Metrics: Backlog size, health score, activity trends
- User Behavior Metrics: Response time, collaboration patterns, activity levels
- Hashtag Metrics: Hashtag usage and trends
- First/Last Action Metrics: Temporal boundaries of activity
- Current Period Metrics: Recent activity (last 30 days, etc.)
- User Classification Metrics: Experience levels, activity classifications
For complete metric definitions, formulas, and use cases, see Metric Definitions.
- Complete Metric Reference: Metric Definitions - Business definitions for all 77+ (countries) and 78+ (users) metrics
- Dashboard Analysis: Dashboard Analysis - Implementation status and recommendations
- Data Flow: Data Flow Diagrams - How data flows from base tables to datamarts
- Improvements History: Evolution of the data warehouse (documentation pending)