Skip to content

feat: public exposure surface (oauth2-proxy + Zitadel OIDC + ext_authz)#41

Open
patrickleet wants to merge 3 commits into
mainfrom
feat/public-exposure
Open

feat: public exposure surface (oauth2-proxy + Zitadel OIDC + ext_authz)#41
patrickleet wants to merge 3 commits into
mainfrom
feat/public-exposure

Conversation

@patrickleet
Copy link
Copy Markdown
Contributor

@patrickleet patrickleet commented May 18, 2026

Summary

Adds spec.auth + spec.exposure to ObserveStack so internal dashboards (Prometheus, Alertmanager, OpenCost) can be reached from the public internet behind Zitadel OIDC, per [[specs/platform-public-exposure]]. The bridge is composed fully:

  • ExternalSecret pulling the Zitadel iam-admin PAT from AWS Secrets Manager (matches AuthStack's consumer-providerconfig.yaml shape inline — no per-cluster authoring).
  • Namespaced Zitadel ProviderConfig sourcing that Secret.
  • Zitadel OIDC Application MR via provider-upjet-zitadel (declarative, per direction during planning rather than out-of-band). Provider writes client_id + client_secret into a K8s Secret oauth2-proxy mounts.
  • Single-replica Redis StatefulSet for oauth2-proxy sessions (PVC, no TLS in v1 — ambient mTLS provides transport security between the two pods).
  • Waypoint Gateway in the observe namespace + per-component sister Services labeled istio.io/use-waypoint.
  • HTTPRoute per component — two-rule (/oauth2/* → oauth2-proxy, / → sister Service).
  • AuthorizationPolicy.CUSTOM per component, provider.name matches the IstioStack-registered extensionProvider (consumes [[tasks/istio-stack-extension-providers]]).
  • Optional cert-manager Certificate when the platform wildcard at the Gateway doesn't already cover spec.exposure.domain.

Grafana app-level OIDC intentionally not included — split into [[tasks/observe-stack-grafana-oidc]] for a smaller focused follow-up.

Operator pre-requisites:

  1. Cookie-signing Secret in the observe namespace (kubectl create secret generic <name> --from-literal=cookie_secret=$(openssl rand -hex 16)).
  2. Register matching extensionProvider entry on IstioStack.
  3. Zitadel project ID provided via spec.auth.zitadelProjectId.

End-to-end verification on pat-local

Composed OIDC client got client_id=373507418958665326 in Zitadel ("platform-services" project). Curl through the live stack (from inside the cluster):

$ curl -D - http://exposure-prometheus.monitoring:9090/
HTTP/1.1 302 Found
location: https://auth.ops.com.ai/oauth/v2/authorize?client_id=373507418958665326&redirect_uri=...%2Foauth2%2Fcallback&...
set-cookie: _oauth2_proxy_csrf=...; Domain=ops.com.ai; Secure; HttpOnly; SameSite=Lax

Waypoint Envoy access log: 302 UAEX ext_authz_denied - inbound-vip|9090|http|exposure-prometheus.monitoring.svc.cluster.local — UAEX = canonical Envoy ext_authz_denied response flag, proving the waypoint really called the registered ext_authz upstream and acted on the response.

Known limitations (carry into release notes)

  • Platform Gateway → Waypoint routing: requests entering through the public ELB currently bypass the per-namespace waypoint and HBONE-tunnel direct to the destination pod, so the 302 redirect only fires for in-cluster ambient sources. Same gap surfaced in the istio sibling PR. Needs a follow-up for either ambient-aware gateway or a different routing approach.
  • Sister Service pod selectors default to stock kube-prometheus-stack v77 conventions; clusters using different chart release names must override spec.exposure.<comp>.{podSelector, serviceName} (pat-local does this — release name is observe).
  • Cookie secret still pre-created by operator. Future iteration could compose an ExternalSecret matching the AnalyticsStack pattern.

Test plan

  • make test — 30/30 pass (28 existing + 2 new)
  • make validate:all — 8/8 examples validate (12–29 resources each)
  • Live install on pat-local: Zitadel OIDC client provisioned via provider; oauth2-proxy + Redis + Waypoint + sister Service all Ready
  • Full curl-through-the-chain proves OIDC flow + ext_authz path (see verification above)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added public exposure capability for monitoring dashboards (Prometheus, Alertmanager, OpenCost, Grafana) with OAuth2 authentication powered by Zitadel OIDC integration.
    • Introduced configuration options for exposure domain, TLS certificate management, and per-component public access settings.
    • Added support for session management via Redis backend.

Review Change Stack

Adds spec.auth + spec.exposure to ObserveStack so internal dashboards
(Prometheus, Alertmanager, OpenCost) can be reached from the public
internet behind Zitadel OIDC. Composes everything the bridge needs:

- ExternalSecret pulling the Zitadel iam-admin PAT from AWS Secrets
  Manager (matches AuthStack's consumer-providerconfig.yaml shape).
- Namespaced Zitadel ProviderConfig.
- Zitadel OIDC Application via provider-upjet-zitadel — writes the
  client_id + client_secret connection details into a K8s Secret.
- Single-replica Redis StatefulSet for oauth2-proxy sessions.
- Waypoint Gateway + per-component sister Services labeled
  istio.io/use-waypoint so AuthorizationPolicy.CUSTOM fires.
- HTTPRoute per component (/oauth2/* -> oauth2-proxy, / -> sister svc).
- AuthorizationPolicy.CUSTOM per component, provider name matches
  the IstioStack-registered extensionProvider.
- Optional cert-manager Certificate when the platform wildcard at the
  Gateway doesn't cover spec.exposure.domain.

Grafana app-level OIDC intentionally not included here — split into a
sibling task for a smaller, focused follow-up.

End-to-end verified on pat-local: composed OIDC client got
client_id=373507418958665326 in Zitadel; curl through the waypointed
sister Service returned "302 -> auth.ops.com.ai/oauth/v2/authorize?...";
waypoint Envoy access log showed "302 UAEX ext_authz_denied".

Implements [[tasks/observe-stack-public-exposure]]
Pattern: [[specs/platform-public-exposure]]
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This pull request adds public dashboard exposure capability to ObserveStack via oauth2-proxy OIDC authentication and Zitadel identity integration. It extends the CRD schema, derives and validates configuration, provisions Zitadel credentials and OIDC applications, deploys oauth2-proxy and Redis infrastructure, configures Istio ambient mesh routing with authorization policies, optionally enables Grafana app-level OIDC, tracks readiness across all components, and includes comprehensive test coverage.

Changes

Public Dashboard Exposure via OAuth2 Bridge

Layer / File(s) Summary
CRD contract and exposure configuration schema
apis/observestacks/definition.yaml, examples/observestacks/exposure.yaml, Makefile
CompositeResourceDefinition extended with spec.auth (Zitadel issuer/AWS Secrets Manager path) and spec.exposure (bridge enablement, domain, per-component routing config, Grafana OIDC). New status.exposure exposes domain, readiness flags, and component URLs. Example manifest demonstrates full exposure configuration.
State derivation and exposure validation
functions/render/000-state-init.yaml.gotmpl
Derives Zitadel domain, gateway/waypoint names, per-component exposure settings, validates required fields when exposure is enabled, emits $state.auth and $state.exposure structures for downstream templates.
Zitadel credential and OIDC app provisioning
functions/render/400-exposure-zitadel-credentials.yaml.gotmpl, functions/render/410-exposure-zitadel-providerconfig.yaml.gotmpl, functions/render/420-exposure-zitadel-oidc-app.yaml.gotmpl
ExternalSecret fetches Zitadel credentials from AWS Secrets Manager; ProviderConfig wires Crossplane-Zitadel integration; OIDC apps provisioned separately for oauth2-proxy (ext_authz mode) and Grafana (app_level mode) with component-specific redirect URIs.
OAuth2-proxy and Redis backend infrastructure
functions/render/430-exposure-certificate.yaml.gotmpl, functions/render/440-exposure-redis.yaml.gotmpl, functions/render/450-exposure-oauth2-proxy.yaml.gotmpl
cert-manager Certificate for exposure wildcard domain; Redis StatefulSet with persistent storage for session backend; oauth2-proxy Deployment with OIDC configuration, credential injection from Zitadel, Redis backend, reverse-proxy routing, and readiness probes.
Istio ambient networking and authorization
functions/render/445-exposure-waypoint.yaml.gotmpl, functions/render/470-exposure-httproutes.yaml.gotmpl, functions/render/480-exposure-auth-policies.yaml.gotmpl
Istio waypoint Gateway and per-component sister Services for ambient mesh routing; Gateway API HTTPRoutes route by hostname/path with conditional /oauth2 routing for oauth2-proxy in ext_authz mode; AuthorizationPolicy enforces CUSTOM extension provider with /oauth2/* path exclusion for OIDC callback reachability.
Grafana app-level OIDC configuration
functions/render/200-kube-prometheus-stack.yaml.gotmpl
Grafana Helm values augmented with Zitadel auth.generic_oauth config when spec.exposure.grafana.enabled, including secret-mounted credentials, session/token settings, optional domain/role/org mapping, and grafana.ini server configuration.
Readiness tracking and status rendering
functions/render/010-state-status.yaml.gotmpl, functions/render/999-status.yaml.gotmpl
Readiness checks extended to Redis, oauth2-proxy, Zitadel resources, OIDC apps, Grafana OIDC client, and Waypoint; aggregated into $exposureOauth2Ready and $exposureOidcReady; XR status.exposure conditionally renders domain, extension provider, oauth2-proxy readiness, and per-component enabled/URL fields with authMode-dependent ready logic.
Test coverage and usage examples
tests/test-render/main.k
Three new CompositionTest scenarios: (1) full bridge composition validation with all Zitadel/Redis/Istio resources and status.exposure fields; (2) Grafana app-level OIDC with dedicated client and direct routing; (3) default behavior when exposure enabled but per-component settings absent.

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰 OAuth2-proxy springs forth with Redis in tow,
While Zitadel guards the public dashboard's glow.
Istio waypoints weave ambient mesh dreams,
And Grafana gets OIDC—the auth-stack's supreme themes!
Status surfaces readiness through the bridge so fine,
A secure exposure tapestry, by design! 🌐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding a public exposure surface for ObserveStack using oauth2-proxy, Zitadel OIDC, and ext_authz authorization.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/public-exposure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Published Crossplane Package

The following Crossplane package was published as part of this PR:

Package: ghcr.io/hops-ops/aws-observe-stack:pr-41-819c13a4f9ed295768dce406d0e3bdb5de9c78cf

View Package

…stener

Two changes that close the gap between in-cluster ext_authz (which already
worked) and external browser exposure (which 503'd):

1. Add istio.io/ingress-use-waypoint=true on the sister Service.
   Without this label, Istio Gateway API gateways (kind: Gateway,
   gatewayClassName=istio) HBONE-tunnel direct to the destination pod
   and bypass the Waypoint, so AuthorizationPolicy.CUSTOM never fires
   for north-south traffic. The plain istio.io/use-waypoint label only
   redirects mesh-internal clients. See istio/istio#51214.

2. spec.exposure.gatewayRef.sectionName -> sectionNames (list, defaults
   to [http, https-apex]). OAuth flows with cookie-secure=true redirect
   to HTTPS for the callback; the HTTPRoute now attaches to both
   listeners so the full flow completes without per-cluster manual
   parentRef wiring.

End-to-end verified externally on pat-local:
$ curl -D - https://prometheus.ops.com.ai/
HTTP/2 302
location: https://auth.ops.com.ai/oauth/v2/authorize?client_id=...
$ # follow the chain -> Zitadel login UI 200 OK

Implements [[tasks/observe-stack-public-exposure]]
Adds spec.exposure.grafana with full app-level OIDC integration per
feedback_app_level_oidc. Grafana speaks OIDC natively, so it sits
DIRECTLY behind the Waypoint with no oauth2-proxy / ext_authz detour —
auth happens inside Grafana itself.

What composes when grafana.enabled=true:

- A SECOND Zitadel Oidc MR (separate from the oauth2-proxy client)
  named <observe-ns>-grafana with redirectUris=[<host>/login/generic_oauth]
  and writeConnectionSecretToRef=<observe-ns>-grafana-oidc-client.

- Grafana Helm values under kube-prometheus-stack:
  grafana.ini.auth.generic_oauth.{enabled, client_id ($__file{...}),
  client_secret ($__file{...}), auth_url, token_url, api_url,
  scopes, use_refresh_token, role_attribute_path, allowed_domains,
  ...}; auth.{login_maximum_lifetime_duration, token_rotation_interval_minutes};
  server.{domain, root_url}; extraSecretMounts mounting the OIDC
  client Secret at /etc/secrets/grafana-oidc so the secret stays off
  the env-var path.

- Sister Service (with both istio.io/use-waypoint and ingress-use-waypoint
  labels per reference-ambient-ingress-use-waypoint) selecting the chart's
  Grafana pod.

- Single-rule HTTPRoute (/ only — no /oauth2/* detour) attached to the
  platform Gateway on [http, https-apex].

- NO AuthorizationPolicy.CUSTOM — Grafana enforces its own auth.

Renderer split: introduces authMode discriminator per exposure component
(ext_authz | app_level). Existing components keep ext_authz; grafana is
the first app_level component. 470-httproutes branches on authMode for
the /oauth2/* rule; 480-auth-policies skips app_level entries entirely.

Status surface: status.exposure.grafana.{enabled, url, ready,
oidcClientReady}. ready computed from kube-prometheus-stack readiness
+ Zitadel OIDC App MR readiness + Waypoint readiness.

Tests: new "exposure-grafana-app-level-oidc-shape" KCL test asserts the
2nd Oidc MR, sister Service labels, single-rule HTTPRoute backendRef,
and the ABSENCE of an AuthorizationPolicy for Grafana. All 31 tests
pass.

Defaults follow specs/platform-public-exposure decision #7:
use_refresh_token=true, login_maximum_lifetime_duration=8h,
token_rotation_interval_minutes=10. Zitadel-side revocation propagates
within ~10min.

Implements [[tasks/observe-stack-grafana-oidc]]
Pattern: [[specs/platform-public-exposure]]
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/test-render/main.k (3)

2341-2358: ⚡ Quick win

Grafana HTTPRoute assertion should also validate listener attachment.

This route test only checks spec.rules. Please also assert hostnames and parentRefs.sectionName (HTTP + HTTPS listeners), otherwise the routing-attachment regression fixed in this PR path can slip through for Grafana.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test-render/main.k` around lines 2341 - 2358, Update the Grafana
HTTPRoute test (the Object with metadata.name
"observe-exposure-httproute-grafana" and its spec.forProvider.manifest) to also
assert that spec.forProvider.manifest.spec.hostnames contains the expected
host(s) and that spec.forProvider.manifest.spec.parentRefs includes entries with
the correct sectionName values for both HTTP and HTTPS listeners; locate the
manifest under spec.forProvider.manifest in the test and add assertions for
spec.hostnames and for each parentRefs[].sectionName to verify listener
attachment.

2377-2414: ⚡ Quick win

Test intent and assertions diverge in the “still-composes-bridge” case.

The test name says bridge resources are still composed, but assertions only check XR status defaulting. Either add assertions for baseline bridge resources (e.g., waypoint/oauth2-proxy/redis) or rename the test to match current scope.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test-render/main.k` around lines 2377 - 2414, The test
metav1alpha1.CompositionTest with metadata.name
"exposure-no-components-still-composes-bridge" has a mismatch between intent and
assertions: update either the test name or the assertions; specifically, if you
intend to verify bridge resources are composed, add assertResources entries for
the baseline bridge components (e.g., oauth2-proxy, waypoint, redis) checking
their apiVersion/kind/metadata.name and any expected status fields, referencing
the existing assertResources block to append those assertions, otherwise rename
metadata.name to reflect that only XR status defaulting
(status.exposure.extensionProviderName) is being asserted.

2053-2258: ⚡ Quick win

“Full bridge” test is missing assertions for key bridge resources.

This case is named as full-bridge coverage, but it does not assert oauth2-proxy Deployment or Redis Service creation. A regression there would still pass this test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test-render/main.k` around lines 2053 - 2258, The test claims
"full-bridge" coverage but lacks assertions for the oauth2-proxy Deployment and
the Redis Service; add two assertResources entries inside the CompositionTest
block: one Object asserting an apps/v1 Deployment with metadata.name
"monitoring-oauth2-proxy" (to verify oauth2-proxy pod creation) and another
Object asserting a v1 Service with metadata.name "monitoring-oauth2-proxy-redis"
(to pair with the existing StatefulSet "monitoring-oauth2-proxy-redis"); ensure
each entry uses kind "Object" and sets spec.forProvider.manifest with the proper
apiVersion/kind/metadata to match those resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@functions/render/999-status.yaml.gotmpl`:
- Line 31: The ext_authz readiness expression currently uses ready: {{ and
$state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady
}} but omits OIDC client provisioning; update the readiness to include the OIDC
client flag (e.g. $state.observed.exposure.oidcClientReady) so the combined
condition requires oauth2ProxyReady, waypointReady, and oidcClientReady before
marking ready.

---

Nitpick comments:
In `@tests/test-render/main.k`:
- Around line 2341-2358: Update the Grafana HTTPRoute test (the Object with
metadata.name "observe-exposure-httproute-grafana" and its
spec.forProvider.manifest) to also assert that
spec.forProvider.manifest.spec.hostnames contains the expected host(s) and that
spec.forProvider.manifest.spec.parentRefs includes entries with the correct
sectionName values for both HTTP and HTTPS listeners; locate the manifest under
spec.forProvider.manifest in the test and add assertions for spec.hostnames and
for each parentRefs[].sectionName to verify listener attachment.
- Around line 2377-2414: The test metav1alpha1.CompositionTest with
metadata.name "exposure-no-components-still-composes-bridge" has a mismatch
between intent and assertions: update either the test name or the assertions;
specifically, if you intend to verify bridge resources are composed, add
assertResources entries for the baseline bridge components (e.g., oauth2-proxy,
waypoint, redis) checking their apiVersion/kind/metadata.name and any expected
status fields, referencing the existing assertResources block to append those
assertions, otherwise rename metadata.name to reflect that only XR status
defaulting (status.exposure.extensionProviderName) is being asserted.
- Around line 2053-2258: The test claims "full-bridge" coverage but lacks
assertions for the oauth2-proxy Deployment and the Redis Service; add two
assertResources entries inside the CompositionTest block: one Object asserting
an apps/v1 Deployment with metadata.name "monitoring-oauth2-proxy" (to verify
oauth2-proxy pod creation) and another Object asserting a v1 Service with
metadata.name "monitoring-oauth2-proxy-redis" (to pair with the existing
StatefulSet "monitoring-oauth2-proxy-redis"); ensure each entry uses kind
"Object" and sets spec.forProvider.manifest with the proper
apiVersion/kind/metadata to match those resources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6303d4d3-918c-4a6a-a394-5f5b485b5dfb

📥 Commits

Reviewing files that changed from the base of the PR and between 00587a7 and 94a4918.

📒 Files selected for processing (17)
  • Makefile
  • apis/observestacks/definition.yaml
  • examples/observestacks/exposure.yaml
  • functions/render/000-state-init.yaml.gotmpl
  • functions/render/010-state-status.yaml.gotmpl
  • functions/render/200-kube-prometheus-stack.yaml.gotmpl
  • functions/render/400-exposure-zitadel-credentials.yaml.gotmpl
  • functions/render/410-exposure-zitadel-providerconfig.yaml.gotmpl
  • functions/render/420-exposure-zitadel-oidc-app.yaml.gotmpl
  • functions/render/430-exposure-certificate.yaml.gotmpl
  • functions/render/440-exposure-redis.yaml.gotmpl
  • functions/render/445-exposure-waypoint.yaml.gotmpl
  • functions/render/450-exposure-oauth2-proxy.yaml.gotmpl
  • functions/render/470-exposure-httproutes.yaml.gotmpl
  • functions/render/480-exposure-auth-policies.yaml.gotmpl
  • functions/render/999-status.yaml.gotmpl
  • tests/test-render/main.k

ready: {{ $state.observed.exposure.grafanaReady }}
oidcClientReady: {{ $state.observed.exposure.grafanaOidcClientReady }}
{{- else }}
ready: {{ and $state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include OIDC client readiness in ext_authz component readiness.

Line 31 can mark a component ready: true while OIDC client provisioning is still not ready, which makes status optimistic for auth-critical paths.

Suggested fix
-      ready: {{ and $state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady }}
+      ready: {{ and
+        $state.observed.exposure.oauth2ProxyReady
+        $state.observed.exposure.oidcClientReady
+        $state.observed.exposure.waypointReady
+      }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ready: {{ and $state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady }}
ready: {{ and
$state.observed.exposure.oauth2ProxyReady
$state.observed.exposure.oidcClientReady
$state.observed.exposure.waypointReady
}}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@functions/render/999-status.yaml.gotmpl` at line 31, The ext_authz readiness
expression currently uses ready: {{ and
$state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady
}} but omits OIDC client provisioning; update the readiness to include the OIDC
client flag (e.g. $state.observed.exposure.oidcClientReady) so the combined
condition requires oauth2ProxyReady, waypointReady, and oidcClientReady before
marking ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant