|
19 | 19 | | `crash_free_sessions` | % of sessions without fatal error | Gauge | `platform` | |
20 | 20 | | `authz_denied_count` | Authorization failures | Counter | `resource`, `role`, `space` | |
21 | 21 | | `flag_evaluation_latency_ms` | Feature flag evaluation | Histogram | `flag_key` | |
| 22 | +| `nav_interaction_total` | Nav hub clicks (flag cohorts) | Counter | `target`, `from`, `variant`, `role` | |
22 | 23 | | `webhook_delivery_success_rate` | Webhook successes vs. attempts | Gauge | `event_type` | |
23 | 24 | | `automod_trigger_count` | Automod actions per rule | Counter | `rule_type`, `space` | |
24 | 25 |
|
25 | | -> 2025-10-31: Added `admin_publish_duration_ms` internal histogram for staff tooling responsiveness and began emitting `content_publish_latency_ms` from `/api/admin/posts`. Structured logs now include `user_id_hash`, `space_id`, and feature flag context for audit correlation. |
| 26 | +> 2025-10-31: Added `admin_publish_duration_ms` internal histogram for staff tooling responsiveness and began emitting `content_publish_latency_ms` from `/api/admin/posts`. Structured logs now include `user_id_hash`, `space_id`, and feature flag context for audit correlation. `nav_interaction_total` now captures navigation hub engagement per flag cohort. |
26 | 27 |
|
27 | 28 | ## 3. Tracing Strategy |
28 | 29 | - Instrument Next.js route handlers and server components with OpenTelemetry. |
|
38 | 39 | ## 5. Dashboards |
39 | 40 | - **Executive KPI Dashboard (`dash_exec_kpi_v1`):** Aggregates content latency (panels for `content_publish_latency_ms`, `admin_publish_duration_ms`), crash-free sessions, and donation funnel placeholders. |
40 | 41 | - **Operations Dashboard (`dash_ops_rbac_v1`):** Displays `authz_denied_count` (tagged by `resource`, `role`, `space`), moderation backlog, feature flag toggles (joining `feature_flag_audit`), and Playwright synthetic status. |
| 42 | +- **Navigation Engagement Panel (`dash_ops_nav_v1`):** Breaks down `nav_interaction_total` by target hub, role, and variant (legacy vs `nav_ia_v1`). |
41 | 43 | - **Commerce Dashboard:** Shows donation funnel, payout queue status, dispute rate. |
42 | 44 | - **Events Dashboard:** Tracks registrations, attendance, revenue, NPS survey results. |
43 | 45 | - **Reliability Dashboard:** SLO status, error budgets, incident history. |
|
68 | 70 | - Adopt OpenTelemetry SDK for Next.js + Node workers; export to vendor (e.g., Grafana Cloud, Honeycomb). |
69 | 71 | - Use Supabase Logflare integration for SQL audit, complement with custom metrics via functions. |
70 | 72 | - Configure synthetic monitoring (Pingdom/Lighthouse CI) for home feed, space page, checkout flow. |
71 | | -- Add Playwright synthetic tests for core user journeys with metrics logging. |
| 73 | +- Add Playwright synthetic tests for core user journeys with metrics logging. `tests/synthetic/observability.spec.ts` covers nav IA, admin flag guard, and publish route smoke flows (skipped when `PLAYWRIGHT_TEST_BASE_URL` undefined). |
72 | 74 |
|
73 | 75 | ## 9. Runbooks |
74 | 76 | - Create `/docs/operations/runbooks/` with scenario-specific guides (publish latency, payment failures, search outage). |
|
0 commit comments