feat: Generate per-agent ConfigMap for authbridge at injection time#297
feat: Generate per-agent ConfigMap for authbridge at injection time#297
Conversation
The operator now creates a per-agent ConfigMap (authbridge-config-<name>)
at pod injection time by reading the namespace-level authbridge-runtime-config,
merging in the authbridge mode and per-agent listener addresses, and writing a
dedicated ConfigMap for each agent.
This solves two problems:
1. Listener addresses (reverse_proxy_addr, forward_proxy_addr,
reverse_proxy_backend) are per-agent (dynamic ports from port-stealing)
but the shared ConfigMap is per-namespace
2. The shared ConfigMap used ${ENV_VAR} placeholders that broke validation
when env vars were unset in other modes
The authbridge sidecar now mounts the per-agent ConfigMap instead of the
shared one. Mode and listener fields are concrete values — no env var
expansion needed.
Changes:
- namespace_config.go: read authbridge-runtime-config ConfigMap
- resolved_config.go: propagate AuthBridgeRuntimeYAML
- pod_mutator.go: ensurePerAgentConfigMap() + call sites for both modes
- volume_builder.go: parameterize authbridge ConfigMap name,
overrideAuthBridgeConfigMapInVolumes helper
- container_builder.go: remove listener env vars and --mode flag from
proxy-sidecar container (now in per-agent ConfigMap)
Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
- Skip identity fallback when ClientAuthType is empty to avoid generating an identity block without a type field (would fail authbridge validation) - Skip per-agent ConfigMap creation when combinedSidecar gate is enabled since that container uses env vars directly and does not mount authbridge-runtime-config Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
- Use AuthBridgeRuntimeConfigMapName constant instead of string literal (goconst) - Remove embedded field LocalObjectReference from selectors (staticcheck QF1008) Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
OwnerReference: the per-agent ConfigMap now gets an OwnerReference to the owning Deployment or StatefulSet, enabling automatic garbage collection when the workload is deleted. Uses controllerutil.SetOwnerReference following the pattern from clientregistration_controller.go. Unit tests: 8 tests covering ensurePerAgentConfigMap: - Empty baseYAML fallback from NamespaceConfig - Base YAML preserves existing fields - Listener overrides merged into config - Existing CM owned by webhook gets updated - Existing CM not owned by webhook is skipped - OwnerReference set from Deployment - OwnerReference set from StatefulSet - No workload found — OwnerReference skipped gracefully Also adds Scheme field to PodMutator struct (required for controllerutil.SetOwnerReference). Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Pass mgr.GetScheme() as the new Scheme parameter added in the previous commit. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
The webhook now creates per-agent ConfigMaps, which requires create and update verbs on configmaps. Added to both config/rbac/role.yaml (kustomize) and charts/kagenti-operator (Helm). Also changed the update path to call setOwnerReferenceFromWorkload on the existing ConfigMap instead of replacing the entire OwnerReferences slice, preserving any references set by other controllers. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Separate configmaps into its own RBAC rule so create and update are not accidentally granted for namespaces and services. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
The previous manual edits to role.yaml were overwritten by controller-gen during make deploy. Add the +kubebuilder:rbac marker to pod_mutator.go so controller-gen generates the correct RBAC with create and update verbs for configmaps. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
When a Deployment creates multiple pods concurrently, the webhook processes each pod admission in parallel. The Get-then-Create pattern has a race window where another admission creates the ConfigMap between our Get (NotFound) and Create. Handle AlreadyExists on Create as success, matching the pattern used by ensureServiceAccount. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Replace the Get → Create/Update pattern with a single server-side apply (Patch with client.Apply + ForceOwnership). This is atomic — no race condition when multiple pod admissions run concurrently. Removes ~15 lines of branching logic (Get/Create/AlreadyExists/Update) in favor of one Patch call that creates or updates idempotently. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
|
I asked Claude about this PR, and it believes there are problems: |
Replace deprecated Patch(ctx, obj, client.Apply) with the typed client.Apply(ctx, applyConfig) API. Uses generated apply configurations from k8s.io/client-go/applyconfigurations instead of manually constructing objects with TypeMeta. Also replaces controllerutil.SetOwnerReference with a typed OwnerReferenceApplyConfiguration builder, removing the controllerutil dependency. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Server-side apply uses PATCH, not CREATE. Added patch to the kubebuilder RBAC marker and Helm chart role. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Collect envoy-proxy container logs, per-agent ConfigMap content, and container statuses before hitting the envoy admin interface. Output goes to GinkgoWriter so it appears in CI logs on failure, making it easier to diagnose Connection refused errors. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
The manager cached client only watches ConfigMaps in namespaces matching specific label selectors. Agent namespaces are outside the cache scope, so ReadNamespaceConfig silently returned empty results and the per-agent ConfigMap was generated without inbound/outbound/identity fields. Add an APIReader field to PodMutator (backed by mgr.GetAPIReader()) that bypasses the cache for ReadNamespaceConfig. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
- Separate namespaces and secrets into individual kubebuilder RBAC markers in authbridge_webhook.go (controller-gen still merges configmaps+secrets in role.yaml because they have identical verb sets — the Helm chart correctly separates them) - Add comment to buildOwnerReference explaining cached client usage vs uncached APIReader for ConfigMaps - Add TestEnsurePerAgentConfigMap_FederatedJWT_MapsToSpiffe covering the federated-jwt -> spiffe identity mapping code path Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
pdettori
left a comment
There was a problem hiding this comment.
Summary
Solid PR. The design is clean: lifting per-agent concerns out of the shared ConfigMap into a dedicated authbridge-config-<name> CM at injection time is the right call, and the progression from Get→Create/Update → AlreadyExists handling → server-side apply shows good iterative thinking. The uncached APIReader fix (commit 6b305bf) is the kind of subtle cache-scope bug that usually only surfaces in production — good catch.
Areas reviewed: Go (injector), Helm RBAC, kustomize RBAC, unit tests, E2E tests
Commits: 18 commits, all signed-off ✓
CI status: All checks pass (Build, E2E, Unit, Integration, Lint, CodeQL, Trivy) ✓
Approve with two suggestions and one nit below.
| WithAPIVersion("apps/v1"). | ||
| WithKind("StatefulSet"). | ||
| WithName(sts.Name). | ||
| WithUID(sts.UID) |
There was a problem hiding this comment.
Suggestion: buildOwnerReference doesn't set Controller: true or BlockOwnerDeletion: true. GC works without these, but K8s convention is to mark the primary managing controller with controller=true. Without it, a second controller could also set a controller OwnerReference on this CM (K8s only rejects a second controller=true reference, not a second controller=nil/false one).
Suggested change for both Deployment and StatefulSet paths:
WithController(ptr.To(true)).
WithBlockOwnerDeletion(ptr.To(true))| if !volumeExists(podSpec.Volumes, requiredVolumes[i].Name) { | ||
| podSpec.Volumes = append(podSpec.Volumes, requiredVolumes[i]) | ||
| // Inject volumes — use per-agent ConfigMap name for authbridge config | ||
| proxyVolumes := overrideAuthBridgeConfigMapInVolumes(requiredVolumes, perAgentCMName) |
There was a problem hiding this comment.
Suggestion: requiredVolumes is only set inside the PerWorkloadConfigResolution block (a few lines up). If PerWorkloadConfigResolution=false and proxy-sidecar mode is active, requiredVolumes is nil here — overrideAuthBridgeConfigMapInVolumes returns an empty slice and the per-agent CM name never makes it into the injected volumes.
If proxy-sidecar always requires PerWorkloadConfigResolution=true (i.e. the scenario isn't reachable in practice), a short comment here prevents future confusion:
// requiredVolumes is set in the PerWorkloadConfigResolution branch above;
// proxy-sidecar mode always requires that gate.Otherwise, the volume override should also cover the non-resolved path.
| _, _ = fmt.Fprintf(GinkgoWriter, "=== envoy-proxy logs (last 30) ===\n%s\n", envoyLogs) | ||
| } | ||
| diagCmd = exec.Command("kubectl", "get", "configmap", | ||
| "authbridge-config-authbridge-agent", |
There was a problem hiding this comment.
Nit: CM name constructed by hand. If the test agent name changes this silently diverges. Since perAgentConfigMapName is unexported, consider a helper constant:
const testAgentCMName = "authbridge-config-" + authBridgeAgentName…nt, test constant - Set Controller=true and BlockOwnerDeletion=true on OwnerReference per K8s convention (prevents a second controller from claiming ownership) - Add comment clarifying requiredVolumes is always set before the proxy-sidecar mode block - Extract authBridgeAgentCMName constant in E2E fixtures to avoid hand-constructed ConfigMap names diverging from agent name Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
…comments - Add deployments/finalizers and statefulsets/finalizers update verb to Helm chart RBAC (required for BlockOwnerDeletion=true) - Remove unused authbridgeConfigMapName parameter from BuildResolvedVolumes — the override function is the canonical way to set per-agent ConfigMap names - Add eventual-consistency comment to buildOwnerReference (cached client may miss newly-created Deployments on first admission) - Log baseYAML length on unmarshal failure for diagnostics Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Leftover from the pre-SSA implementation that used controllerutil.SetOwnerReference. The final implementation uses typed apply configurations which don't need a Scheme. Removes Scheme from struct, NewPodMutator, cmd/main.go, and tests. Signed-off-by: Hai Huang <hai@us.ibm.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Hai Huang <huang195@gmail.com>
Summary
authbridge-config-<name>) at pod injection timeauthbridge-runtime-config, merges in mode + per-agent listener addressesREVERSE_PROXY_ADDR,REVERSE_PROXY_BACKEND,FORWARD_PROXY_ADDRenv vars and--modeflag from the proxy-sidecar container (all now in the per-agent ConfigMap)Context
Listener addresses are per-agent (port-stealing assigns dynamic ports) but the shared ConfigMap is per-namespace. Previously we used
${ENV_VAR}placeholders in the shared ConfigMap, which broke validation when env vars were unset in other modes.The three-layer ConfigMap flow:
authbridge-runtime-configper namespace (defaults)authbridge-runtime-configfor runtime namespaces (same defaults)authbridge-config-<agent>Files changed
namespace_config.goauthbridge-runtime-configConfigMapresolved_config.goAuthBridgeRuntimeYAMLpod_mutator.goensurePerAgentConfigMap()+ call sites for both modesvolume_builder.gooverrideAuthBridgeConfigMapInVolumescontainer_builder.go--modefrom proxy-sidecar*_test.goTest plan
go test ./internal/webhook/injector/— all passgo vet— cleangofmt— cleanmode: envoy-sidecarkagenti.io/authbridge-mode=proxy-sidecar→ per-agent ConfigMap with listener addressesAssisted-By: Claude (Anthropic AI) noreply@anthropic.com