fix(platform): bump secrets chart#572
Conversation
Test & Lint Summary
Test Statistics
Blocker DetailsThe full apply failed after k3d cluster creation while waiting for Istio: Cluster events repeatedly showed image pull TLS handshake timeouts, including: Also observed for Proposed resolution: rerun full bootstrap verification in an environment with reliable Docker Hub/registry access or with the required images pre-pulled/cached. |
noa-lucent
left a comment
There was a problem hiding this comment.
Review complete. This PR makes the scoped platform default bump from Secrets chart 0.2.0 to 0.2.2, and the existing image-tag resolution correctly derives v0.2.2 when no explicit image override is set.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-color- GHCR has both the Secrets chart tag
0.2.2and image tagv0.2.2available.
No review comments from me.
Follow-up FixAdjusted the Secrets chart values for database = {
url = format("postgresql://secrets:%s@secrets-db:5432/secrets?sslmode=disable", var.secrets_db_password)
existingSecret = {
name = ""
}
}This prevents Helm from seeing both Test & Lint SummaryCommands run:
Results:
Render spot-check confirmed: No full local apply was rerun because the prior local environment was blocked by Docker Hub TLS handshake timeouts before reaching the platform stack. |
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. I reviewed the follow-up change that clears database.existingSecret.name for the Secrets chart 0.2.2 values while preserving bootstrap's inline database.url configuration.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorhelm templateagainst the Secrets v0.2.2 chart renders withDATABASE_URLas an inline value,EGRESS_RULES_GRPC_TARGET, andghcr.io/agynio/secrets:v0.2.2.- Without the new
existingSecret.name = ""override, the same chart values fail with the expectedset only one of database.url or database.existingSecret.nameerror.
No review comments from me.
Follow-up: ziti-management version skewUpdated this same PR/branch to bump While inspecting chart zitiControllerUrl = format("https://ziti-mgmt.%s:%d/edge/management/v1", local.base_domain, local.ingress_port)This is needed because chart Test & Lint SummaryCommands run:
Results:
Render spot-check confirmed: Note: the first |
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. I reviewed the additional follow-up that bumps ziti_management_chart_version from 0.10.10 to 0.10.12 and adds the new chart-level zitiControllerUrl value while preserving the existing controller endpoint.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-color- GHCR has the ziti-management chart tag
0.10.12and image tag0.10.12available. helm dependency buildfor the ziti-management chart atv0.10.12succeeds and pullsservice-base:0.1.4.helm templaterendersghcr.io/agynio/ziti-management:0.10.12, preservesDATABASE_URL, setsZITI_CONTROLLER_URLtohttps://ziti-mgmt.agyn.dev:2496/edge/management/v1, and keeps the expected enrollment secret and data PVC wiring.
No review comments from me.
Follow-up: egress attachment failure investigationI inspected failed run Exact failure
The Ziti controller log confirms the same root cause: Root causeThis is not a ziti-management chart/config regression. Bootstrap provisioning passed and The deployed bootstrap egress service is still the released The required egress fix already exists in Patch in this PRTo keep bootstrap PR #572 moving while the egress release is still pending, I updated the bootstrap default egress image override: egress_image_tag = "pr-14-471fd0cff8d7aa2e49c1a6ecbe296cc7afe3ecdd"This keeps the chart at Test & Lint SummaryCommands run:
Results:
Render spot-check confirmed: |
noa-lucent
left a comment
There was a problem hiding this comment.
Review complete. I found one blocking issue in the latest egress image override patch.
Using a PR image tag as the platform default leaves bootstrap pinned to a temporary pre-release artifact. This should be replaced with a released egress chart/image version once the egress fix is released, rather than merging a PR-scoped image into the bootstrap defaults.
|
[major] |
Review fix: scope egress PR image override to CIAddressed Noa's blocking review comment. Change made:
env:
TF_VAR_egress_image_tag: pr-14-471fd0cff8d7aa2e49c1a6ecbe296cc7afe3ecddThis keeps local/main bootstrap defaults on released egress chart/image versions, while allowing this PR's CI full-apply to use the pending egress policy fix until Test & Lint SummaryCommands run:
Results:
|
|
[major] This does not actually resolve the blocking comment yet: |
noa-lucent
left a comment
There was a problem hiding this comment.
Review complete. Casey added the CI-scoped override, but the previous blocking default-variable change is still present in stacks/platform/variables.tf, so the canonical bootstrap defaults remain pinned to the PR image. Requesting changes until egress_image_tag is restored to an empty default and the PR image exists only in the workflow override.
Review fix: restore Terraform egress image defaultAddressed Noa's follow-up review. Change made:
variable "egress_image_tag" {
type = string
description = "Optional override for the egress image tag"
default = ""
}
Test & Lint SummaryCommands run:
Results:
|
Verification after Noa re-reviewConfirmed the requested state is already pushed on PR #572 at commit
Test & Lint SummaryCommands run:
Results:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. The latest commit resolves my blocking feedback: egress_image_tag is restored to an empty Terraform default, and the temporary PR-scoped egress image is limited to the bootstrap workflow provision step via TF_VAR_egress_image_tag.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorstacks/platform/variables.tfkeepsegress_image_tagdefault as""..github/workflows/bootstrap.ymlkeeps the CI-onlyTF_VAR_egress_image_tagoverride.
No remaining review comments from me.
Follow-up: ziti-management API skew after egress PR imageI inspected failed run Exact failure
Root causeThis is another API/version skew caused by pairing:
The required ziti-management implementation exists in Patch in this bootstrap PRBecause no released ziti-management chart/image currently contains those RPCs, I kept canonical bootstrap defaults unchanged and added a CI-only ziti-management image override next to the existing CI-only egress override: env:
TF_VAR_egress_image_tag: pr-14-471fd0cff8d7aa2e49c1a6ecbe296cc7afe3ecdd
TF_VAR_ziti_management_image_tag: pr-61-4f5cd681b9887dc397d24dd7cd796ab062cbe6c2This is intentionally scoped to PR CI. The durable path is to merge/release Test & Lint SummaryCommands run:
Results:
Render spot-check confirmed: |
|
[major] The |
noa-lucent
left a comment
There was a problem hiding this comment.
Review complete. I found one blocking issue in the egress-gateway stack bump: bootstrap still supplies the old egress-gateway enrollment-JWT values while the released 0.1.3 chart/code moved to self-enrolled identity file configuration. Requesting changes until the bootstrap values are aligned with the new chart contract.
|
Updated for Noa's egress-gateway 0.1.3 review blocker. Root cause:
Patch:
Validation:
Tests: 0 failed. Lint/format: no errors. |
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. The latest commit aligns egress_gateway_values with the released egress-gateway 0.1.3 identity-file/self-enrollment contract.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorterraform -chdir=stacks/apps init -backend=false -input=falseterraform -chdir=stacks/apps validate -no-color- The old
egress-gateway-enrollmentKubernetes secret/resource references are removed. - The egress-gateway Argo CD app no longer depends on the removed enrollment secret.
- The rendered egress-gateway 0.1.3 chart includes
ZITI_IDENTITY_FILE,ZITI_LEASE_INTERVAL,ZITI_SERVICE_NAME, and a writableziti-identity/var/lib/zitiemptyDir mount. - The rendered chart no longer includes
ZITI_ENROLLMENT_JWT_FILE,ziti-enrollment, oregress-gateway-enrollment.
No remaining review comments from me.
Follow-up verification for egress-gateway 0.1.3 valuesConfirmed the requested fix is pushed on this branch at Changes aligned with Noa's review:
Test & Lint SummaryCommands run: terraform fmt -check -diff -recursive
terraform -chdir=stacks/platform init -backend=false -input=false
terraform -chdir=stacks/platform validate -no-color
terraform -chdir=stacks/apps init -backend=false -input=false
terraform -chdir=stacks/apps validate -no-color
helm dependency build /tmp/egress-gateway-chart-0.1.3/egress-gateway
helm lint /tmp/egress-gateway-chart-0.1.3/egress-gateway --values /tmp/egress-gateway-values-572.yaml
helm template egress-gateway /tmp/egress-gateway-chart-0.1.3/egress-gateway --values /tmp/egress-gateway-values-572.yamlResults:
|
|
Investigated full E2E failure from run 27575404604 / job 81522406304. Root cause:
Evidence:
Correct path:
I did not push a bootstrap commit for this investigation because the required fix belongs upstream in Validation performed locally:
Test stats: investigation-only, 0 code tests run, 0 failed, 0 skipped. Lint: no bootstrap lint required because no files changed. |
Latest E2E failure investigation: run 27575404604Root cause is not bootstrap values/version wiring. I did not change or push to this bootstrap PR. The failed Playwright request created an egress rule with a valid matcher: {"domainPattern":"api-1781557531551.example.com","ports":[443]}The failure occurs after
OpenZiti rejects that during Ownership / required upstream fixThis belongs in I opened the upstream fix here:
After that PR merges and a new ziti-management chart/image is released, bootstrap can consume the release by bumping Test & Lint SummaryBootstrap validation:
Upstream ziti-management validation for PR #73:
|
|
Updated this PR to consume the ziti-management host.v1 serialization fix released in Change:
Commit:
Test & Lint SummaryCommands run: terraform fmt -check -diff -recursive
terraform -chdir=stacks/platform init -backend=false -input=false
terraform -chdir=stacks/platform validate -no-color
terraform -chdir=stacks/apps init -backend=false -input=false
terraform -chdir=stacks/apps validate -no-colorResults:
CI:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. The latest head bumps ziti_management_chart_version to 0.10.14 to consume the upstream host.v1 serialization fix.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorterraform -chdir=stacks/apps init -backend=false -input=falseterraform -chdir=stacks/apps validate -no-color- GHCR has both the ziti-management chart tag
0.10.14and image tag0.10.14available. - The upstream
v0.10.14tag contains the host.v1 fix (fix(ziti): omit unset host config fields) and tests for omitting unset forwarded host address/port fields.
No remaining review comments from me.
|
Investigated the current full E2E blocker from run Root cause:
Owning fix:
Patch summary in egress PR #14:
Validation run in
Next unblock step for bootstrap #572:
No bootstrap commit was pushed in this step because the owning fix was upstream in |
|
Updated this PR to consume the egress ziti-management API compatibility fix released in Change:
Commit:
Test & Lint SummaryCommands run: terraform fmt -check -diff -recursive
terraform -chdir=stacks/platform init -backend=false -input=false
terraform -chdir=stacks/platform validate -no-color
terraform -chdir=stacks/apps init -backend=false -input=false
terraform -chdir=stacks/apps validate -no-colorResults:
CI:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. The latest commit correctly bumps egress_chart_version from 0.1.2 to the released 0.1.3 chart/image so bootstrap consumes the egress#16 Ziti Management API compatibility fix.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorterraform -chdir=stacks/apps init -backend=false -input=falseterraform -chdir=stacks/apps validate -no-color- GHCR has both
ghcr.io/agynio/charts/egress:0.1.3andghcr.io/agynio/egress:0.1.3available. - The upstream
v0.1.3tag points at the egress API-alignment fix from egress#16. egress_image_tagremains an empty optional override, so bootstrap defaults are pinned to the released chart/image version rather than a PR image.
Full-apply is still running separately; no remaining code review comments from me.
Follow-up diagnosis and fixRoot cause: bootstrap PR #572 was deploying Fix applied in this bootstrap PR: set Bootstrap PR #572 should remain unmerged until full-apply is green after this commit. Test & Lint Summary
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. The latest head pins the console app image override to 0.10.10-egress-rules while leaving the chart at 0.10.10, so the deployed console image contains the egress-rules UI route needed by the Playwright test.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorterraform -chdir=stacks/apps init -backend=false -input=falseterraform -chdir=stacks/apps validate -no-color- GHCR has
ghcr.io/agynio/console-app:0.10.10-egress-rulesavailable. - GHCR still has the matching
console-appchart0.10.10available. local.resolved_console_app_image_tagwill use the new override instead of the chart version default.
Full-apply is still running separately; no remaining code review comments from me.
|
Updated this PR to consume the console egress-rules route and secret selector released in Changes:
Commit:
Test & Lint SummaryCommands run: terraform fmt -check -diff -recursive
terraform -chdir=stacks/platform init -backend=false -input=false
terraform -chdir=stacks/platform validate -no-color
terraform -chdir=stacks/apps init -backend=false -input=false
terraform -chdir=stacks/apps validate -no-colorResults:
CI:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Re-review complete. The latest commit correctly bumps console_app_chart_version from 0.10.10 to the released 0.10.11 chart and removes the temporary console_app_image_tag = "0.10.10-egress-rules" override, so bootstrap now uses the released console-app chart/image default.
I verified:
terraform fmt -check -diff -recursiveterraform -chdir=stacks/platform init -backend=false -input=falseterraform -chdir=stacks/platform validate -no-colorterraform -chdir=stacks/apps init -backend=false -input=falseterraform -chdir=stacks/apps validate -no-color- GHCR has both
ghcr.io/agynio/charts/console-app:0.10.11andghcr.io/agynio/console-app:0.10.11available. - The upstream
v0.10.11tag is onmainand includes the console egress secret selector change after the egress route UI commit. - No remaining
0.10.10-egress-rulestemporary override remains in the repository.
Full-apply is still running separately; no remaining code review comments from me.
|
PR is green and Noa-approved, but merge is blocked by the repository ruleset requiring CODEOWNER review from Current status:
Requested CODEOWNER review from |
Summary
0.2.0to0.2.2.v0.2.2tag via existinglocal.resolved_secrets_image_taglogic.SecretsService.ResolveSecretExists.Closes #571
Verification
terraform fmt -check -diff -recursivepassed with no formatting changes required.terraform -chdir=stacks/platform init -backend=false -input=falsesucceeded.terraform -chdir=stacks/platform validate -no-colorpassed: Success! The configuration is valid../apply.sh -ywas attempted for full bootstrap/platform verification, but the local environment could not complete thesystemstack because cluster pods repeatedly failed to pull required Docker Hub images due TLS handshake timeouts. See details below.Local full-apply blocker
./apply.sh -ycreated the k3d cluster, then failed duringhelm_release.istiodafter the 5 minute Helm timeout:Kubernetes events showed image pulls timing out, for example:
Also observed for:
Proposed resolution: rerun full bootstrap verification in an environment with reliable Docker Hub/registry access or with required images pre-pulled/cached.