feat: public exposure surface (oauth2-proxy + Zitadel OIDC + ext_authz)#41
feat: public exposure surface (oauth2-proxy + Zitadel OIDC + ext_authz)#41patrickleet wants to merge 3 commits into
Conversation
Adds spec.auth + spec.exposure to ObserveStack so internal dashboards (Prometheus, Alertmanager, OpenCost) can be reached from the public internet behind Zitadel OIDC. Composes everything the bridge needs: - ExternalSecret pulling the Zitadel iam-admin PAT from AWS Secrets Manager (matches AuthStack's consumer-providerconfig.yaml shape). - Namespaced Zitadel ProviderConfig. - Zitadel OIDC Application via provider-upjet-zitadel — writes the client_id + client_secret connection details into a K8s Secret. - Single-replica Redis StatefulSet for oauth2-proxy sessions. - Waypoint Gateway + per-component sister Services labeled istio.io/use-waypoint so AuthorizationPolicy.CUSTOM fires. - HTTPRoute per component (/oauth2/* -> oauth2-proxy, / -> sister svc). - AuthorizationPolicy.CUSTOM per component, provider name matches the IstioStack-registered extensionProvider. - Optional cert-manager Certificate when the platform wildcard at the Gateway doesn't cover spec.exposure.domain. Grafana app-level OIDC intentionally not included here — split into a sibling task for a smaller, focused follow-up. End-to-end verified on pat-local: composed OIDC client got client_id=373507418958665326 in Zitadel; curl through the waypointed sister Service returned "302 -> auth.ops.com.ai/oauth/v2/authorize?..."; waypoint Envoy access log showed "302 UAEX ext_authz_denied". Implements [[tasks/observe-stack-public-exposure]] Pattern: [[specs/platform-public-exposure]]
📝 WalkthroughWalkthroughThis pull request adds public dashboard exposure capability to ObserveStack via oauth2-proxy OIDC authentication and Zitadel identity integration. It extends the CRD schema, derives and validates configuration, provisions Zitadel credentials and OIDC applications, deploys oauth2-proxy and Redis infrastructure, configures Istio ambient mesh routing with authorization policies, optionally enables Grafana app-level OIDC, tracks readiness across all components, and includes comprehensive test coverage. ChangesPublic Dashboard Exposure via OAuth2 Bridge
🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Published Crossplane PackageThe following Crossplane package was published as part of this PR: Package: ghcr.io/hops-ops/aws-observe-stack:pr-41-819c13a4f9ed295768dce406d0e3bdb5de9c78cf |
…stener Two changes that close the gap between in-cluster ext_authz (which already worked) and external browser exposure (which 503'd): 1. Add istio.io/ingress-use-waypoint=true on the sister Service. Without this label, Istio Gateway API gateways (kind: Gateway, gatewayClassName=istio) HBONE-tunnel direct to the destination pod and bypass the Waypoint, so AuthorizationPolicy.CUSTOM never fires for north-south traffic. The plain istio.io/use-waypoint label only redirects mesh-internal clients. See istio/istio#51214. 2. spec.exposure.gatewayRef.sectionName -> sectionNames (list, defaults to [http, https-apex]). OAuth flows with cookie-secure=true redirect to HTTPS for the callback; the HTTPRoute now attaches to both listeners so the full flow completes without per-cluster manual parentRef wiring. End-to-end verified externally on pat-local: $ curl -D - https://prometheus.ops.com.ai/ HTTP/2 302 location: https://auth.ops.com.ai/oauth/v2/authorize?client_id=... $ # follow the chain -> Zitadel login UI 200 OK Implements [[tasks/observe-stack-public-exposure]]
Adds spec.exposure.grafana with full app-level OIDC integration per
feedback_app_level_oidc. Grafana speaks OIDC natively, so it sits
DIRECTLY behind the Waypoint with no oauth2-proxy / ext_authz detour —
auth happens inside Grafana itself.
What composes when grafana.enabled=true:
- A SECOND Zitadel Oidc MR (separate from the oauth2-proxy client)
named <observe-ns>-grafana with redirectUris=[<host>/login/generic_oauth]
and writeConnectionSecretToRef=<observe-ns>-grafana-oidc-client.
- Grafana Helm values under kube-prometheus-stack:
grafana.ini.auth.generic_oauth.{enabled, client_id ($__file{...}),
client_secret ($__file{...}), auth_url, token_url, api_url,
scopes, use_refresh_token, role_attribute_path, allowed_domains,
...}; auth.{login_maximum_lifetime_duration, token_rotation_interval_minutes};
server.{domain, root_url}; extraSecretMounts mounting the OIDC
client Secret at /etc/secrets/grafana-oidc so the secret stays off
the env-var path.
- Sister Service (with both istio.io/use-waypoint and ingress-use-waypoint
labels per reference-ambient-ingress-use-waypoint) selecting the chart's
Grafana pod.
- Single-rule HTTPRoute (/ only — no /oauth2/* detour) attached to the
platform Gateway on [http, https-apex].
- NO AuthorizationPolicy.CUSTOM — Grafana enforces its own auth.
Renderer split: introduces authMode discriminator per exposure component
(ext_authz | app_level). Existing components keep ext_authz; grafana is
the first app_level component. 470-httproutes branches on authMode for
the /oauth2/* rule; 480-auth-policies skips app_level entries entirely.
Status surface: status.exposure.grafana.{enabled, url, ready,
oidcClientReady}. ready computed from kube-prometheus-stack readiness
+ Zitadel OIDC App MR readiness + Waypoint readiness.
Tests: new "exposure-grafana-app-level-oidc-shape" KCL test asserts the
2nd Oidc MR, sister Service labels, single-rule HTTPRoute backendRef,
and the ABSENCE of an AuthorizationPolicy for Grafana. All 31 tests
pass.
Defaults follow specs/platform-public-exposure decision #7:
use_refresh_token=true, login_maximum_lifetime_duration=8h,
token_rotation_interval_minutes=10. Zitadel-side revocation propagates
within ~10min.
Implements [[tasks/observe-stack-grafana-oidc]]
Pattern: [[specs/platform-public-exposure]]
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/test-render/main.k (3)
2341-2358: ⚡ Quick winGrafana HTTPRoute assertion should also validate listener attachment.
This route test only checks
spec.rules. Please also asserthostnamesandparentRefs.sectionName(HTTP + HTTPS listeners), otherwise the routing-attachment regression fixed in this PR path can slip through for Grafana.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test-render/main.k` around lines 2341 - 2358, Update the Grafana HTTPRoute test (the Object with metadata.name "observe-exposure-httproute-grafana" and its spec.forProvider.manifest) to also assert that spec.forProvider.manifest.spec.hostnames contains the expected host(s) and that spec.forProvider.manifest.spec.parentRefs includes entries with the correct sectionName values for both HTTP and HTTPS listeners; locate the manifest under spec.forProvider.manifest in the test and add assertions for spec.hostnames and for each parentRefs[].sectionName to verify listener attachment.
2377-2414: ⚡ Quick winTest intent and assertions diverge in the “still-composes-bridge” case.
The test name says bridge resources are still composed, but assertions only check XR status defaulting. Either add assertions for baseline bridge resources (e.g., waypoint/oauth2-proxy/redis) or rename the test to match current scope.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test-render/main.k` around lines 2377 - 2414, The test metav1alpha1.CompositionTest with metadata.name "exposure-no-components-still-composes-bridge" has a mismatch between intent and assertions: update either the test name or the assertions; specifically, if you intend to verify bridge resources are composed, add assertResources entries for the baseline bridge components (e.g., oauth2-proxy, waypoint, redis) checking their apiVersion/kind/metadata.name and any expected status fields, referencing the existing assertResources block to append those assertions, otherwise rename metadata.name to reflect that only XR status defaulting (status.exposure.extensionProviderName) is being asserted.
2053-2258: ⚡ Quick win“Full bridge” test is missing assertions for key bridge resources.
This case is named as full-bridge coverage, but it does not assert oauth2-proxy Deployment or Redis Service creation. A regression there would still pass this test.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test-render/main.k` around lines 2053 - 2258, The test claims "full-bridge" coverage but lacks assertions for the oauth2-proxy Deployment and the Redis Service; add two assertResources entries inside the CompositionTest block: one Object asserting an apps/v1 Deployment with metadata.name "monitoring-oauth2-proxy" (to verify oauth2-proxy pod creation) and another Object asserting a v1 Service with metadata.name "monitoring-oauth2-proxy-redis" (to pair with the existing StatefulSet "monitoring-oauth2-proxy-redis"); ensure each entry uses kind "Object" and sets spec.forProvider.manifest with the proper apiVersion/kind/metadata to match those resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@functions/render/999-status.yaml.gotmpl`:
- Line 31: The ext_authz readiness expression currently uses ready: {{ and
$state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady
}} but omits OIDC client provisioning; update the readiness to include the OIDC
client flag (e.g. $state.observed.exposure.oidcClientReady) so the combined
condition requires oauth2ProxyReady, waypointReady, and oidcClientReady before
marking ready.
---
Nitpick comments:
In `@tests/test-render/main.k`:
- Around line 2341-2358: Update the Grafana HTTPRoute test (the Object with
metadata.name "observe-exposure-httproute-grafana" and its
spec.forProvider.manifest) to also assert that
spec.forProvider.manifest.spec.hostnames contains the expected host(s) and that
spec.forProvider.manifest.spec.parentRefs includes entries with the correct
sectionName values for both HTTP and HTTPS listeners; locate the manifest under
spec.forProvider.manifest in the test and add assertions for spec.hostnames and
for each parentRefs[].sectionName to verify listener attachment.
- Around line 2377-2414: The test metav1alpha1.CompositionTest with
metadata.name "exposure-no-components-still-composes-bridge" has a mismatch
between intent and assertions: update either the test name or the assertions;
specifically, if you intend to verify bridge resources are composed, add
assertResources entries for the baseline bridge components (e.g., oauth2-proxy,
waypoint, redis) checking their apiVersion/kind/metadata.name and any expected
status fields, referencing the existing assertResources block to append those
assertions, otherwise rename metadata.name to reflect that only XR status
defaulting (status.exposure.extensionProviderName) is being asserted.
- Around line 2053-2258: The test claims "full-bridge" coverage but lacks
assertions for the oauth2-proxy Deployment and the Redis Service; add two
assertResources entries inside the CompositionTest block: one Object asserting
an apps/v1 Deployment with metadata.name "monitoring-oauth2-proxy" (to verify
oauth2-proxy pod creation) and another Object asserting a v1 Service with
metadata.name "monitoring-oauth2-proxy-redis" (to pair with the existing
StatefulSet "monitoring-oauth2-proxy-redis"); ensure each entry uses kind
"Object" and sets spec.forProvider.manifest with the proper
apiVersion/kind/metadata to match those resources.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6303d4d3-918c-4a6a-a394-5f5b485b5dfb
📒 Files selected for processing (17)
Makefileapis/observestacks/definition.yamlexamples/observestacks/exposure.yamlfunctions/render/000-state-init.yaml.gotmplfunctions/render/010-state-status.yaml.gotmplfunctions/render/200-kube-prometheus-stack.yaml.gotmplfunctions/render/400-exposure-zitadel-credentials.yaml.gotmplfunctions/render/410-exposure-zitadel-providerconfig.yaml.gotmplfunctions/render/420-exposure-zitadel-oidc-app.yaml.gotmplfunctions/render/430-exposure-certificate.yaml.gotmplfunctions/render/440-exposure-redis.yaml.gotmplfunctions/render/445-exposure-waypoint.yaml.gotmplfunctions/render/450-exposure-oauth2-proxy.yaml.gotmplfunctions/render/470-exposure-httproutes.yaml.gotmplfunctions/render/480-exposure-auth-policies.yaml.gotmplfunctions/render/999-status.yaml.gotmpltests/test-render/main.k
| ready: {{ $state.observed.exposure.grafanaReady }} | ||
| oidcClientReady: {{ $state.observed.exposure.grafanaOidcClientReady }} | ||
| {{- else }} | ||
| ready: {{ and $state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady }} |
There was a problem hiding this comment.
Include OIDC client readiness in ext_authz component readiness.
Line 31 can mark a component ready: true while OIDC client provisioning is still not ready, which makes status optimistic for auth-critical paths.
Suggested fix
- ready: {{ and $state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady }}
+ ready: {{ and
+ $state.observed.exposure.oauth2ProxyReady
+ $state.observed.exposure.oidcClientReady
+ $state.observed.exposure.waypointReady
+ }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ready: {{ and $state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady }} | |
| ready: {{ and | |
| $state.observed.exposure.oauth2ProxyReady | |
| $state.observed.exposure.oidcClientReady | |
| $state.observed.exposure.waypointReady | |
| }} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@functions/render/999-status.yaml.gotmpl` at line 31, The ext_authz readiness
expression currently uses ready: {{ and
$state.observed.exposure.oauth2ProxyReady $state.observed.exposure.waypointReady
}} but omits OIDC client provisioning; update the readiness to include the OIDC
client flag (e.g. $state.observed.exposure.oidcClientReady) so the combined
condition requires oauth2ProxyReady, waypointReady, and oidcClientReady before
marking ready.
Summary
Adds
spec.auth+spec.exposureto ObserveStack so internal dashboards (Prometheus, Alertmanager, OpenCost) can be reached from the public internet behind Zitadel OIDC, per[[specs/platform-public-exposure]]. The bridge is composed fully:consumer-providerconfig.yamlshape inline — no per-cluster authoring).provider-upjet-zitadel(declarative, per direction during planning rather than out-of-band). Provider writesclient_id+client_secretinto a K8s Secret oauth2-proxy mounts.istio.io/use-waypoint./oauth2/*→ oauth2-proxy,/→ sister Service).provider.namematches the IstioStack-registered extensionProvider (consumes[[tasks/istio-stack-extension-providers]]).spec.exposure.domain.Grafana app-level OIDC intentionally not included — split into
[[tasks/observe-stack-grafana-oidc]]for a smaller focused follow-up.Operator pre-requisites:
kubectl create secret generic <name> --from-literal=cookie_secret=$(openssl rand -hex 16)).spec.auth.zitadelProjectId.End-to-end verification on pat-local
Composed OIDC client got
client_id=373507418958665326in Zitadel ("platform-services" project). Curl through the live stack (from inside the cluster):Waypoint Envoy access log:
302 UAEX ext_authz_denied - inbound-vip|9090|http|exposure-prometheus.monitoring.svc.cluster.local— UAEX = canonical Envoyext_authz_deniedresponse flag, proving the waypoint really called the registered ext_authz upstream and acted on the response.Known limitations (carry into release notes)
spec.exposure.<comp>.{podSelector, serviceName}(pat-local does this — release name isobserve).Test plan
make test— 30/30 pass (28 existing + 2 new)make validate:all— 8/8 examples validate (12–29 resources each)🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes