Skip to content

feat: Generate per-agent ConfigMap for authbridge at injection time#297

Merged
huang195 merged 21 commits intomainfrom
feat/per-agent-configmap
Apr 21, 2026
Merged

feat: Generate per-agent ConfigMap for authbridge at injection time#297
huang195 merged 21 commits intomainfrom
feat/per-agent-configmap

Conversation

@huang195
Copy link
Copy Markdown
Contributor

Summary

  • Operator creates a per-agent ConfigMap (authbridge-config-<name>) at pod injection time
  • Reads the namespace-level authbridge-runtime-config, merges in mode + per-agent listener addresses
  • Authbridge sidecar mounts the per-agent ConfigMap instead of the shared one
  • Removes REVERSE_PROXY_ADDR, REVERSE_PROXY_BACKEND, FORWARD_PROXY_ADDR env vars and --mode flag from the proxy-sidecar container (all now in the per-agent ConfigMap)

Context

Listener addresses are per-agent (port-stealing assigns dynamic ports) but the shared ConfigMap is per-namespace. Previously we used ${ENV_VAR} placeholders in the shared ConfigMap, which broke validation when env vars were unset in other modes.

The three-layer ConfigMap flow:

  1. Helm creates authbridge-runtime-config per namespace (defaults)
  2. Backend creates authbridge-runtime-config for runtime namespaces (same defaults)
  3. Operator reads it at injection, merges per-agent config, writes authbridge-config-<agent>

Files changed

File Change
namespace_config.go Read authbridge-runtime-config ConfigMap
resolved_config.go Propagate AuthBridgeRuntimeYAML
pod_mutator.go ensurePerAgentConfigMap() + call sites for both modes
volume_builder.go Parameterize authbridge ConfigMap name, overrideAuthBridgeConfigMapInVolumes
container_builder.go Remove listener env vars and --mode from proxy-sidecar
*_test.go Updated tests for new behavior

Test plan

  • go test ./internal/webhook/injector/ — all pass
  • go vet — clean
  • gofmt — clean
  • E2E: deploy agent with default mode → per-agent ConfigMap with mode: envoy-sidecar
  • E2E: deploy with kagenti.io/authbridge-mode=proxy-sidecar → per-agent ConfigMap with listener addresses

Assisted-By: Claude (Anthropic AI) noreply@anthropic.com

The operator now creates a per-agent ConfigMap (authbridge-config-<name>)
at pod injection time by reading the namespace-level authbridge-runtime-config,
merging in the authbridge mode and per-agent listener addresses, and writing a
dedicated ConfigMap for each agent.

This solves two problems:
1. Listener addresses (reverse_proxy_addr, forward_proxy_addr,
   reverse_proxy_backend) are per-agent (dynamic ports from port-stealing)
   but the shared ConfigMap is per-namespace
2. The shared ConfigMap used ${ENV_VAR} placeholders that broke validation
   when env vars were unset in other modes

The authbridge sidecar now mounts the per-agent ConfigMap instead of the
shared one. Mode and listener fields are concrete values — no env var
expansion needed.

Changes:
- namespace_config.go: read authbridge-runtime-config ConfigMap
- resolved_config.go: propagate AuthBridgeRuntimeYAML
- pod_mutator.go: ensurePerAgentConfigMap() + call sites for both modes
- volume_builder.go: parameterize authbridge ConfigMap name,
  overrideAuthBridgeConfigMapInVolumes helper
- container_builder.go: remove listener env vars and --mode flag from
  proxy-sidecar container (now in per-agent ConfigMap)

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
- Skip identity fallback when ClientAuthType is empty to avoid
  generating an identity block without a type field (would fail
  authbridge validation)
- Skip per-agent ConfigMap creation when combinedSidecar gate is
  enabled since that container uses env vars directly and does not
  mount authbridge-runtime-config

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
- Use AuthBridgeRuntimeConfigMapName constant instead of string literal
  (goconst)
- Remove embedded field LocalObjectReference from selectors (staticcheck
  QF1008)

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
OwnerReference: the per-agent ConfigMap now gets an OwnerReference to
the owning Deployment or StatefulSet, enabling automatic garbage
collection when the workload is deleted. Uses controllerutil.SetOwnerReference
following the pattern from clientregistration_controller.go.

Unit tests: 8 tests covering ensurePerAgentConfigMap:
- Empty baseYAML fallback from NamespaceConfig
- Base YAML preserves existing fields
- Listener overrides merged into config
- Existing CM owned by webhook gets updated
- Existing CM not owned by webhook is skipped
- OwnerReference set from Deployment
- OwnerReference set from StatefulSet
- No workload found — OwnerReference skipped gracefully

Also adds Scheme field to PodMutator struct (required for
controllerutil.SetOwnerReference).

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Pass mgr.GetScheme() as the new Scheme parameter added in the
previous commit.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
The webhook now creates per-agent ConfigMaps, which requires create
and update verbs on configmaps. Added to both config/rbac/role.yaml
(kustomize) and charts/kagenti-operator (Helm).

Also changed the update path to call setOwnerReferenceFromWorkload
on the existing ConfigMap instead of replacing the entire
OwnerReferences slice, preserving any references set by other
controllers.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Separate configmaps into its own RBAC rule so create and update
are not accidentally granted for namespaces and services.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
The previous manual edits to role.yaml were overwritten by
controller-gen during make deploy. Add the +kubebuilder:rbac
marker to pod_mutator.go so controller-gen generates the correct
RBAC with create and update verbs for configmaps.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
@huang195 huang195 marked this pull request as ready for review April 20, 2026 15:18
@huang195 huang195 requested a review from a team as a code owner April 20, 2026 15:18
When a Deployment creates multiple pods concurrently, the webhook
processes each pod admission in parallel. The Get-then-Create
pattern has a race window where another admission creates the
ConfigMap between our Get (NotFound) and Create. Handle
AlreadyExists on Create as success, matching the pattern used
by ensureServiceAccount.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Replace the Get → Create/Update pattern with a single server-side
apply (Patch with client.Apply + ForceOwnership). This is atomic —
no race condition when multiple pod admissions run concurrently.

Removes ~15 lines of branching logic (Get/Create/AlreadyExists/Update)
in favor of one Patch call that creates or updates idempotently.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
@esnible
Copy link
Copy Markdown
Contributor

esnible commented Apr 20, 2026

I asked Claude about this PR, and it believes there are problems:

⏺ Found a critical issue — the RBAC is missing the patch verb for configmaps, but server-side apply uses Patch. Let me verify what the kubebuilder
  marker generates and what the Helm chart has.                                                                                                                                                                                                         
                                                                                          
⏺ The RBAC is missing patch in both the kubebuilder marker, the kustomize role.yaml, and the Helm role.yaml. Server-side apply (client.Patch)      
  requires the patch verb. This is a must-fix bug.  

Replace deprecated Patch(ctx, obj, client.Apply) with the typed
client.Apply(ctx, applyConfig) API. Uses generated apply
configurations from k8s.io/client-go/applyconfigurations instead
of manually constructing objects with TypeMeta.

Also replaces controllerutil.SetOwnerReference with a typed
OwnerReferenceApplyConfiguration builder, removing the
controllerutil dependency.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Server-side apply uses PATCH, not CREATE. Added patch to the
kubebuilder RBAC marker and Helm chart role.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Collect envoy-proxy container logs, per-agent ConfigMap content,
and container statuses before hitting the envoy admin interface.
Output goes to GinkgoWriter so it appears in CI logs on failure,
making it easier to diagnose Connection refused errors.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
The manager cached client only watches ConfigMaps in namespaces
matching specific label selectors. Agent namespaces are outside
the cache scope, so ReadNamespaceConfig silently returned empty
results and the per-agent ConfigMap was generated without
inbound/outbound/identity fields.

Add an APIReader field to PodMutator (backed by mgr.GetAPIReader())
that bypasses the cache for ReadNamespaceConfig.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
- Separate namespaces and secrets into individual kubebuilder RBAC
  markers in authbridge_webhook.go (controller-gen still merges
  configmaps+secrets in role.yaml because they have identical verb
  sets — the Helm chart correctly separates them)
- Add comment to buildOwnerReference explaining cached client usage
  vs uncached APIReader for ConfigMaps
- Add TestEnsurePerAgentConfigMap_FederatedJWT_MapsToSpiffe covering
  the federated-jwt -> spiffe identity mapping code path

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Copy link
Copy Markdown
Contributor

@pdettori pdettori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid PR. The design is clean: lifting per-agent concerns out of the shared ConfigMap into a dedicated authbridge-config-<name> CM at injection time is the right call, and the progression from Get→Create/Update → AlreadyExists handling → server-side apply shows good iterative thinking. The uncached APIReader fix (commit 6b305bf) is the kind of subtle cache-scope bug that usually only surfaces in production — good catch.

Areas reviewed: Go (injector), Helm RBAC, kustomize RBAC, unit tests, E2E tests
Commits: 18 commits, all signed-off ✓
CI status: All checks pass (Build, E2E, Unit, Integration, Lint, CodeQL, Trivy) ✓

Approve with two suggestions and one nit below.

WithAPIVersion("apps/v1").
WithKind("StatefulSet").
WithName(sts.Name).
WithUID(sts.UID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: buildOwnerReference doesn't set Controller: true or BlockOwnerDeletion: true. GC works without these, but K8s convention is to mark the primary managing controller with controller=true. Without it, a second controller could also set a controller OwnerReference on this CM (K8s only rejects a second controller=true reference, not a second controller=nil/false one).

Suggested change for both Deployment and StatefulSet paths:

WithController(ptr.To(true)).
WithBlockOwnerDeletion(ptr.To(true))

if !volumeExists(podSpec.Volumes, requiredVolumes[i].Name) {
podSpec.Volumes = append(podSpec.Volumes, requiredVolumes[i])
// Inject volumes — use per-agent ConfigMap name for authbridge config
proxyVolumes := overrideAuthBridgeConfigMapInVolumes(requiredVolumes, perAgentCMName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: requiredVolumes is only set inside the PerWorkloadConfigResolution block (a few lines up). If PerWorkloadConfigResolution=false and proxy-sidecar mode is active, requiredVolumes is nil here — overrideAuthBridgeConfigMapInVolumes returns an empty slice and the per-agent CM name never makes it into the injected volumes.

If proxy-sidecar always requires PerWorkloadConfigResolution=true (i.e. the scenario isn't reachable in practice), a short comment here prevents future confusion:

// requiredVolumes is set in the PerWorkloadConfigResolution branch above;
// proxy-sidecar mode always requires that gate.

Otherwise, the volume override should also cover the non-resolved path.

Comment thread kagenti-operator/test/e2e/e2e_test.go Outdated
_, _ = fmt.Fprintf(GinkgoWriter, "=== envoy-proxy logs (last 30) ===\n%s\n", envoyLogs)
}
diagCmd = exec.Command("kubectl", "get", "configmap",
"authbridge-config-authbridge-agent",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: CM name constructed by hand. If the test agent name changes this silently diverges. Since perAgentConfigMapName is unexported, consider a helper constant:

const testAgentCMName = "authbridge-config-" + authBridgeAgentName

…nt, test constant

- Set Controller=true and BlockOwnerDeletion=true on OwnerReference
  per K8s convention (prevents a second controller from claiming
  ownership)
- Add comment clarifying requiredVolumes is always set before the
  proxy-sidecar mode block
- Extract authBridgeAgentCMName constant in E2E fixtures to avoid
  hand-constructed ConfigMap names diverging from agent name

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
…comments

- Add deployments/finalizers and statefulsets/finalizers update verb
  to Helm chart RBAC (required for BlockOwnerDeletion=true)
- Remove unused authbridgeConfigMapName parameter from
  BuildResolvedVolumes — the override function is the canonical way
  to set per-agent ConfigMap names
- Add eventual-consistency comment to buildOwnerReference (cached
  client may miss newly-created Deployments on first admission)
- Log baseYAML length on unmarshal failure for diagnostics

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
Leftover from the pre-SSA implementation that used
controllerutil.SetOwnerReference. The final implementation uses
typed apply configurations which don't need a Scheme.

Removes Scheme from struct, NewPodMutator, cmd/main.go, and tests.

Signed-off-by: Hai Huang <hai@us.ibm.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Hai Huang <huang195@gmail.com>
@huang195 huang195 merged commit d682560 into main Apr 21, 2026
15 checks passed
@huang195 huang195 deleted the feat/per-agent-configmap branch April 21, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants