Skip to content

feat: pivot psql-stack to EKS Auto Mode defaults#10

Merged
patrickleet merged 44 commits into
mainfrom
feat/auto-mode-pivot
May 8, 2026
Merged

feat: pivot psql-stack to EKS Auto Mode defaults#10
patrickleet merged 44 commits into
mainfrom
feat/auto-mode-pivot

Conversation

@patrickleet
Copy link
Copy Markdown
Contributor

@patrickleet patrickleet commented Apr 28, 2026

Summary

Pivots the stack from StackGres → CNPG → Mayastor → Longhorn → EKS Auto Mode primitives. The journey is preserved on feat/cnpg-pivot (Mayastor) and feat/longhorn-pivot (Longhorn V2) checkpoint branches.

The end state is a dramatically slimmer stack that runs cleanly on EKS Auto Mode without dedicated NodePools, custom AMIs, or in-cluster storage clusters. Composes 4 things:

  • CloudNativePG operator (Helm)
  • Atlas operator (Helm) — declarative schema migrations
  • cnpg-i-scale-to-zero plugin (set of 9 Crossplane Objects)
  • A psql VolumeSnapshotClass (driver: ebs.csi.eks.amazonaws.com) for PSQLBranch's CoW fork target

PSQLClusters target whatever StorageClass the cluster already provides (gp3 on Auto Mode); the stack stops opining about SCs.

Why the pivot

  1. Mayastor — chart hardcodes kubernetes.io/arch: amd64 on the io_engine DaemonSet; broken on arm64 Graviton (which we use for cost-optimal NVMe instance types). Also brings ~4 vCPU + 6 GiB platform tax (etcd×3, NATS, minio, loki, alloy) that duplicates other stacks.
  2. Longhorn V2 — even with V2 enabled, longhorn-manager env-checks for iscsiadm at startup unconditionally. Bottlerocket (EKS Auto Mode's OS) has an immutable rootfs — no path to install iscsiadm. Verified live on pat-local: manager CrashLoopBackOff with the env-check failure.
  3. Auto Mode primitives — gp3 SC + EBS CSI driver + composed VolumeSnapshotClass. AWS-blessed, zero node-side prep, no in-cluster storage cluster. Tradeoff: branch volumes are full-cost EBS clones rather than CoW deltas; manageable with TTL-bounded ephemeral branches at the typical Vercel-like multi-tenant shape. Revisit Longhorn-on-self-managed-nodes if/when CoW economics warrant it (separate `aws-storage-stack` planned).

Schema diff

Drops:

  • `spec.nodePool` (NodePool sub-pools removed — Auto Mode handles node provisioning)
  • `spec.nodePrep` (DaemonSet removed — no host-level setup needed)
  • `spec.storage.{chartVersion, namespace, replicationFactor, thin, values, ...}` (storage cluster gone)

Adds:

  • `spec.snapshotClass.{enabled, name, driver, deletionPolicy, parameters}` — minimal config for the composed VSC

XRD shrinks ~290 → ~170 lines.

New dependency

  • `volume-snapshot-stack` — installs the cluster-wide snapshot-controller via the piraeus-charts Helm chart. EKS Auto Mode ships the snapshot CRDs but not the controller; without it our composed VSC is inert.

Test plan

  • `make render` — clean output for all examples
  • `make test` — 11/11 KCL render tests passing
  • `hops config install` builds + pushes the package to colima
  • Live install on pat-local (EKS Auto Mode + Bottlerocket + arm64 Graviton): CNPG + Atlas + S2Z all 1/1 Ready, composed VSC present
  • Smoke test: PVC bound on `ebs.csi.eks.amazonaws.com` → VolumeSnapshot via composed `psql` VSC reached `readyToUse=true` with backing EBS snapshot

Suggested merge style

Squash — the 16 commits on this branch include the StackGres→CNPG→Mayastor→Longhorn→Auto Mode journey; the experiment branches are pushed (`feat/cnpg-pivot`, `feat/longhorn-pivot`) for historical reference. Squash gives main a single clean pivot commit.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added PSQLCluster composite for managed PostgreSQL cluster deployment with HA, scale-to-zero, and monitoring support
    • Added PSQLBranch composite for creating ephemeral database forks via volume snapshots
    • Introduced CloudNativePG-based platform stack replacing prior StackGres architecture
    • Added scale-to-zero plugin for automatic idle cost reduction
    • Added comprehensive example configurations for local, minimal, and production deployments
  • Documentation

    • Updated README with new CloudNativePG architecture, design, and prerequisites

patrickleet and others added 21 commits April 22, 2026 21:44
Add opt-in Karpenter NodePool composed resource. When
spec.nodePool.enabled: true, renders a NodePool targeting arm64 spot on
r7g.large/r7g.xlarge/m7g.large/m7g.xlarge (memory-optimized Graviton),
tainted with psql=true:NoSchedule and labeled workload-type: psql.

StackGres (operator/restapi/jobs) and Atlas get nodeSelector + tolerations
injected into their Helm values when the NodePool is enabled. Usages pin
both releases to be drained before the NodePool is deleted. Adds
provider-kubernetes to upbound.yaml for the NodePool Object wrapper.

Implements [[tasks/psql-stack-vela-simplyblock]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- storageClass (default on): creates a gp3 StorageClass backed by the
  EKS Auto Mode EBS CSI driver (ebs.csi.eks.amazonaws.com). The legacy
  gp2 in-tree provisioner does not work on EKS Auto Mode.
- externalSecrets (opt-in): for each entry in externalSecrets.secrets[],
  composes a kubernetes.m.crossplane.io/Object wrapping an ESO
  ExternalSecret that syncs an AWS Secrets Manager value (published via
  hops secrets sync aws) into a Kubernetes Secret on the target cluster.
  Requires a ClusterSecretStore (e.g. from SecretStack); defaults to
  clusterSecretStoreName: hops-aws-secrets-manager.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors observe stack where StorageClasses are named loki/prometheus/tempo
per-component. Keeps the name specific to the stack so it doesn't collide
with cluster-provided defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the generic externalSecrets.secrets[] passthrough with a
Postgres-specific connections[] API. The user publishes just a password
via hops secrets (single JSON key, default 'password'); the stack
combines it with non-secret host/port/database/username/sslmode/namespace
and emits a K8s Secret with a ready-to-use 'url' key plus discrete fields.

Downstream consumers reference whichever key they need:
  AtlasSchema.devURLFrom       → url
  SGCluster credentials.users.superuser.password → password
  applications                 → url (or discrete fields)

Breaking change to the (locally-only) externalSecrets API; redeploy with
the new shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without managing the SGCluster, every field in externalSecrets.connections[]
(host/port/database/namespace) is passthrough — no abstraction. Users write
ExternalSecret CRs directly (or as a Crossplane Object wrapper in local/)
against the ClusterSecretStore provisioned by SecretStack.

PSQLStack now = platform only: StackGres + Atlas operators + NodePool +
StorageClass. If we later add an instances[] that composes SGClusters,
ESO wiring can come back for free since the stack will then know
host/port/database/namespace without the user restating them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites PSQLStack schema to remove stackgresOperator, add cnpg and
scaleToZeroPlugin blocks. Default namespace: cnpg-system. CNPG 1.29
(chart 0.27.1) replaces StackGres 1.18 as the operator. Atlas operator
renumbered 210 → 220 to make room for the scale-to-zero plugin
install (added in a later phase).

Storage (psql StorageClass on EBS gp3) and NodePool blocks preserved
unchanged — phases 2 and 3 will rewrite them for the three-profile
(mayastor / lvm / ebs) storage model and the branches/primary NodePool
split with hugepages + nvme-tcp pre-configured.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single storageClass block with a storage block exposing three
independent profiles:

- storage.mayastor (replicated NVMe-oF via OpenEBS Mayastor) — enterprise
  default for primary serving clusters; CoW + HA across N replicas. Default
  enabled=false until phase 3 lands the NodePool with hugepages + nvme-tcp.
- storage.lvm (single-node CoW via OpenEBS LVM LocalPV) — branches and dev
  clusters. Default enabled=false until phase 3 lands NodePool LVM volume
  groups.
- storage.ebs (EBS gp3 via EKS Auto Mode CSI) — durable fallback, no CoW;
  always-on default.

Render templates:
- 120-storageclass.yaml.gotmpl deleted (was the single 'psql' SC)
- 160-openebs-lvm.yaml.gotmpl: Helm release for OpenEBS LVM LocalPV
- 165-openebs-mayastor.yaml.gotmpl: Helm release for OpenEBS Mayastor
- 170-storageclass-mayastor.yaml.gotmpl: psql-mayastor SC + VolumeSnapshotClass
- 175-storageclass-lvm.yaml.gotmpl: psql-lvm SC + VolumeSnapshotClass
- 180-storageclass-ebs.yaml.gotmpl: psql-ebs SC (renamed from 'psql')

state-init defaults the three profile blocks; state-status observes the new
resource keys. Mayastor + LVM Helm releases + their StorageClasses are gated
on storage.{mayastor,lvm}.enabled — only EBS materializes by default until
phase 3 unblocks the others.

standard.yaml example patched to drop the removed storageClass and
stackgresOperator fields (full example rewrite is phase 5).

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single Karpenter NodePool with two sub-pools targeting NVMe
arm64 instance-store nodes (i4g.2xlarge / i4g.4xlarge / im4gn.2xlarge):

- nodePool.branches: spot — for ephemeral PSQLBranch workloads. Spot is
  acceptable since branches are reproducible.
- nodePool.primary: on-demand — for PSQLCluster primaries and operators
  (CNPG, Atlas, scale-to-zero). Spot preemption would lose a Mayastor
  replica, so on-demand is the right default for serving workloads.

Each sub-pool has its own labels (sub-pool=branches | sub-pool=primary)
and matching taints so workloads can target one specifically. Operators
ride the primary sub-pool via nodeSelector + tolerations injected from
state-init.

Render templates:
- 150-nodepool.yaml.gotmpl deleted
- 140-nodepool-branches.yaml.gotmpl: spot sub-pool
- 145-nodepool-primary.yaml.gotmpl: on-demand sub-pool + Usage protection
  for CNPG/Atlas Releases against premature NodePool deletion.

state-init defaults the new sub-pool blocks; state-status observes both.
nodePool.enabled stays default-false — existing claims unchanged.

Out of scope for this commit: node-side prep for Mayastor + LVM
(hugepages, nvme-tcp module, LVM volume group on instance-store NVMe).
That's phase 3b (a separate concern that needs careful image / runtime
choices) — without it, mayastor.enabled and lvm.enabled won't bind PVCs.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inlines the upstream cnpg-i-scale-to-zero v0.1.7 release manifest as 9
Crossplane Kubernetes Objects:

- ServiceAccount cnpg-scale-to-zero-plugin
- ClusterRole cnpg-scale-to-zero-sidecar-role + ClusterRoleBinding
- Secret scale-to-zero-config (sidecar image reference, paired to plugin
  version via stringData — k8s base64-encodes on apply)
- Self-signed cert-manager Issuer + 2 Certificates (server + client) for
  the gRPC TLS material the CNPG operator uses to reach the plugin
- Service scale-to-zero (cnpg.io/pluginPort=9090, cnpg.io/pluginName
  annotations)
- Deployment scale-to-zero (the plugin gRPC server)

Plugin and sidecar images both pin to spec.scaleToZeroPlugin.version
(default v0.1.7). Secret is renamed scale-to-zero-config (was
scale-to-zero-config-c2c2544fbk in upstream — drops the kustomize
hash suffix since we emit the resources as separate Objects).

All resources are gated on spec.scaleToZeroPlugin.enabled (default true
— the plugin is zero-cost when no PSQLCluster opts in).

Source URL is annotated for renovate tracking:
  source: https://github.com/xataio/cnpg-i-scale-to-zero/releases/download/$VER/manifest.yaml
  renovate: datasource=github-releases depName=xataio/cnpg-i-scale-to-zero

Prereq: cert-manager must be installed (provided by the dns-stack in
hops-ops). Without it, the Issuer + Certificate resources won't reconcile
and the plugin Deployment won't have its TLS volumes available.

When PSQLClusters opt into scale-to-zero, they add:
  metadata.annotations:
    xata.io/scale-to-zero-enabled: "true"
    xata.io/scale-to-zero-inactivity-minutes: "10"
  spec.plugins:
    - name: cnpg-i-scale-to-zero.xata.io

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(phase 5)

- README rewritten: CNPG architecture, three-stage journey (EBS-only →
  +LVM CoW → +Mayastor HA), full Spec Reference table, prereq notes
  (cert-manager via cert-manager-stack; node prep deferred to phase 3b)
- examples refreshed:
  - minimal: just clusterName, EBS-only baseline
  - standard: full production posture (NodePool sub-pools, Mayastor +
    LVM + EBS, S2Z plugin, Atlas)
  - local: dev cluster with LVM CoW only (no Mayastor since replication
    needs >1 node, no NodePool, default Helm provider config)
- tests/test-render/main.k: rewritten against the CNPG schema
  - dropped stackgresOperator-specific tests (field removed)
  - 11 tests covering: minimal renders platform operators; custom labels
    propagate; cnpg.overrideAllValues replaces defaults; atlas values
    merge; namespace propagation; per-component namespace override;
    helmProviderConfigRef defaults; scaleToZeroPlugin can be disabled;
    storage.{mayastor,lvm}.enabled compose Helm + StorageClass +
    VolumeSnapshotClass; nodePool.enabled composes both sub-pools

All 11 tests pass; render + validate green on minimal + standard.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a privileged DaemonSet that runs on each NVMe NodePool node and
configures the host-level state Mayastor + OpenEBS LVM LocalPV expect
to find:

- Hugepages (vm.nr_hugepages, default 1024 → 2GiB of 2MiB pages,
  required by Mayastor SPDK)
- nvme-tcp kernel module loaded (Mayastor NVMe-oF transport)
- LVM volume group on the first instance-store NVMe device
  (OpenEBS LVM LocalPV expects the VG to pre-exist)

Auto-gated: composed only when nodePool.enabled AND
(storage.mayastor.enabled OR storage.lvm.enabled). Inside the script,
each step is conditional on the relevant storage backend.

Schema additions:
- spec.nodePrep.enabled (default true; auto-gated by storage backends)
- spec.nodePrep.hugepages.count (default 1024)
- spec.nodePrep.image (default alpine:3.20; apk-installs lvm2 +
  util-linux at startup)

Render template 155-node-prep-daemonset.yaml.gotmpl:
- DaemonSet with nodeSelector workload-type=psql + tolerations for
  both psql=true:NoSchedule and sub-pool=*:NoSchedule taints
- hostPID + hostNetwork + privileged init container
- Init script falls back gracefully on Bottlerocket / Auto Mode where
  modprobe inside containers is restricted (warns and continues)
- Tiny pause container keeps the DS Ready

state-init / state-status / 010-state-status updated for the new
resource. Validate clean (22 resources on standard), 11/11 render
tests still pass.

Caveat: live verification on pat-local deferred until Mayastor or LVM
is enabled in a PSQLStack claim there. Schema/composition is sound; e2e
testing follows when storage backends are turned on.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-claim cost on Mayastor is controlled via replicationFactor (3 for
primaries, 1 for ephemeral branches). LVM was a redundant single-node
CoW backend; EBS was just wrapping the cluster's existing default SC
and contradicted the stack's CoW-by-default identity.

Schema changes:
- spec.storage flattened — no more {mayastor,lvm,ebs} profile selector.
  Top-level fields now: chartVersion, namespace, storageClassName
  (default "psql"), replicationFactor, thin, reclaimPolicy,
  volumeBindingMode, allowVolumeExpansion, values, overrideAllValues.
- spec.nodePool.enabled defaults to TRUE now (was false). The stack's
  whole point is dedicated NVMe nodes; the default should make it work.
- Mayastor + StorageClass + node-prep DaemonSet all gated on
  nodePool.enabled. Disable nodePool to opt out (gets you only
  CNPG + Atlas + S2Z plugin running on the cluster's default SC).

Render templates removed:
- 160-openebs-lvm.yaml.gotmpl
- 175-storageclass-lvm.yaml.gotmpl
- 180-storageclass-ebs.yaml.gotmpl

Render templates updated:
- 165-openebs-mayastor.yaml.gotmpl: gated on nodePool.enabled (was
  storage.mayastor.enabled), uses flat $state.storage shape
- 170-storageclass-mayastor.yaml.gotmpl: same gating + shape
- 155-node-prep-daemonset.yaml.gotmpl: dropped LVM VG step, gated on
  just nodePool.enabled
- 010-state-status.yaml.gotmpl: dropped LVM/EBS observed keys

PSQLClusters that don't want CoW can specify any other StorageClass
that exists on the target cluster (e.g., the EKS Auto Mode default
gp3 SC). The stack does NOT compose a non-CoW SC — that's outside
its identity.

Examples + README rewritten. KCL tests updated: 10/10 passing.
Validate clean: 18 resources on minimal, 18 on standard.

Live verification on pat-local pending — would need to delete the
existing claim (with ebs/lvm enabled) and reapply the simplified one.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iskPools

Three changes prompted by review:

1. Primary sub-pool default is now spot (was on-demand). Mayastor's
   replicationFactor=3 absorbs preemption — losing one replica triggers
   a rebuild on a fresh node, not data loss. Override to on-demand via
   spec.nodePool.primary.requirements when needed.

2. Karpenter NodePool inner names lose the cluster prefix. Was
   `<clusterName>-psql-{branches,primary}`, now `<XR.name>-{branches,primary}`
   (e.g. `psql-branches`, `psql-primary`). Less repetition; uses the
   stack's XR name for disambiguation when multiple PSQLStacks share a
   cluster (which is rare). Wrapper Crossplane Object names unchanged.

3. node-prep DaemonSet now also registers the local NVMe instance-store
   device with Mayastor by creating a per-node DiskPool CR. New ServiceAccount
   + ClusterRole/Binding granting get/create/list/watch on
   diskpools.openebs.io. Init script: detects /dev/nvme[1-9]n1 by lsblk
   model match, kubectl-applies a DiskPool named psql-pool-<NODE_NAME>
   (idempotent — skip if already present). Closes the gap that left
   Mayastor pools empty and PVCs stuck Pending.

This means no separate ObservedObjectCollection or custom function for
DiskPool registration — the same DS that handles host-level prereqs
(hugepages, nvme-tcp) also handles pool registration. One declarative
artifact per node-prep concern.

Also fixes a YAML colon issue in the primary sub-pool description.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds spec.ha block: a single toggle that enables production-style HA
defaults across every HA-able platform component without users needing
to know each chart's specific values keys.

When spec.ha.enabled: true (default false):
- CNPG operator: replicaCount=3 + topologySpreadConstraints by zone
- Atlas operator: same
- cnpg-i-scale-to-zero plugin Deployment (directly composed): same
- OpenEBS Mayastor: agents.core / csi.controller / etcd replicaCount=3

Per-component values can still override via the existing values block —
HA values land in chartDefaults, user values mergeOverwrite them.

Schema:
- spec.ha.enabled (bool, default false)
- spec.ha.replicas (int, default 3)
- spec.ha.topologySpreadByZone (bool, default true)

Standard example now demonstrates HA. New KCL test asserts replicaCount
flows through to CNPG + Atlas Releases when ha.enabled=true. README
updated with the new fields + a Components table entry.

Render + validate clean (21 resources). 11/11 KCL tests pass.

Implements [[tasks/psql-stack-cnpg]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoids Helm ownership conflicts when the cluster already has these CRDs
from another source (e.g., a previous OpenEBS LVM install, the cluster's
snapshot-controller, or another chart that bundles them).

The CRDs (volumesnapshotclasses.snapshot.storage.k8s.io and friends) need
to come from somewhere on the cluster — the assumption is that
snapshot-controller is installed separately as a cluster-level concern,
not bundled with each storage backend.

Encountered live during pat-local install: leftover LVM CRD annotations
blocked Mayastor's upgrade. Skip CRD install in our defaults to make this
robust across re-installs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
csi-node DaemonSet was running on every cluster node, but workers without
the node-prep DS don't have nvme_tcp loaded, so csi-node crashloops there
("Failed to detect nvme_tcp kernel module").

Restrict csi-node via spec.csi.node.{nodeSelector,tolerations} to the
same psql NodePool nodes where node-prep loads nvme_tcp. PSQLCluster
workloads always schedule on workload-type=psql nodes via the existing
NodePool selectors, so this doesn't restrict anything that actually
consumes Mayastor PVCs.

Also fixes a chartDefaults merge bug introduced when HA mode landed:
both nodePool and HA were calling \`set chartDefaults "csi"\` which
clobbered each other. Now build the csi sub-dict incrementally before
adding it to chartDefaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 165-longhorn.yaml.gotmpl: longhorn chart 1.10.0, V2 data engine + SPDK
- 170-storageclass-longhorn.yaml.gotmpl: driver.longhorn.io SC + VSC,
  dataEngine=v2, diskSelector=psql
- 155-node-prep-daemonset.yaml.gotmpl: modprobe nvme_tcp / vfio_pci /
  uio_pci_generic / ublk_drv; replace DiskPool registration with
  node.longhorn.io/default-disks-config annotation
- definition.yaml + state-init: chart defaults bump (1.10.0,
  longhorn-system); doc strings updated
- 010-state-status: observe `longhorn` + `storageclass-longhorn`
- README + examples updated; KCL tests updated and all 11 pass

Mayastor checkpoint preserved on feat/cnpg-pivot. Longhorn V2 is
"Experimental" upstream as of 1.10 — drops bundled etcd/NATS/minio/
loki/alloy footprint and unblocks arm64 Graviton (Mayastor's chart
hardcodes amd64 on io_engine).
Drops Mayastor/Longhorn experimentation entirely:
- Remove NodePool sub-pools (branches + primary)
- Remove node-prep DaemonSet
- Remove Mayastor / Longhorn Helm release templates
- Remove Mayastor/Longhorn StorageClass

Slim spec.storage block down to spec.snapshotClass {enabled, name,
driver, deletionPolicy, parameters}. Default driver ebs.csi.aws.com
(EKS Auto Mode default); override for non-AWS providers. The stack
no longer composes a StorageClass — PSQLClusters target whatever SC
the cluster already provides.

Stack now composes 4 things: CNPG operator, Atlas operator, S2Z
plugin (9 Objects), and one VolumeSnapshotClass. CNPG/Atlas/S2Z
templates lose nodeSelector/tolerations refs; they run wherever Auto
Mode schedules them.

XRD shrinks from ~290 lines to ~170. README rewritten — no more
NVMe/SPDK/iscsi noise. Examples collapsed to clean shapes.

Replicated CoW storage (Longhorn et al) is now a separate concern,
to be provided by aws-storage-stack (self-managed ASG nodes with
proper userData) when the multi-tenant CoW economics justify the
operational cost. Bottlerocket/Auto Mode is incompatible with iscsi-
based engines (longhorn-manager env-checks for iscsiadm even with
V2; Bottlerocket's immutable rootfs blocks installation).

Branch checkpoints preserved on:
  feat/cnpg-pivot       — Mayastor experiment
  feat/longhorn-pivot   — Longhorn V2 experiment
EKS Auto Mode uses the managed `ebs.csi.eks.amazonaws.com` CSI driver,
not the upstream `ebs.csi.aws.com`. Fix the default in state-init,
XRD, README, and the minimal example so out-of-the-box installs on
Auto Mode produce a working VolumeSnapshotClass without override.

Self-managed EBS users can still override `spec.snapshotClass.driver`
to `ebs.csi.aws.com`; non-AWS users override to their CSI driver name.

Verified live on pat-local: psql VSC reconciles to driver
ebs.csi.eks.amazonaws.com, XR Synced=True Ready=True, CNPG + Atlas +
S2Z plugin all running cleanly on Auto Mode without dedicated nodes.
EKS Auto Mode ships the snapshot.storage.k8s.io CRDs but does NOT
ship the snapshot-controller. Without a controller, our composed
VolumeSnapshotClass is inert — PSQLBranch snapshots will sit
forever without ever reaching ReadyToUse.

Document it as a prerequisite (like cert-manager). The stack itself
does not compose snapshot-controller — it's a foundational cluster
concern that belongs in a separate stack (TBD: extend aws-cert-stack
or create a focused snapshot-stack).

Verified end-to-end on pat-local with the upstream
kubernetes-csi/external-snapshotter v8.2.0 manifests:
  - PVC bound on ebs.csi.eks.amazonaws.com
  - VolumeSnapshot via the composed `psql` VSC reached
    readyToUse=true with a real EBS snapshot backing it.
The dependency now exists as a real stack (xrs/stacks/k8s/volume-snapshot/,
ghcr.io/hops-ops/volume-snapshot-stack). Replace the manual upstream-YAML
install instructions with a pointer at the stack.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR migrates a PostgreSQL management stack from StackGres-based operator control to CloudNativePG-based cluster lifecycle management, introducing new PSQLCluster and PSQLBranch composite resources, storage class/snapshot class composition, scale-to-zero plugin integration, and HA configuration surfaces across the CRDs, composition functions, examples, and test suites.

Changes

CloudNativePG Platform Migration

Layer / File(s) Summary
CRD Schema & Public API
apis/psqlstacks/definition.yaml, apis/psqlclusters/definition.yaml, apis/psqlbranches/definition.yaml
New CRDs PSQLCluster and PSQLBranch define cluster sizing, credentials, branching, HA, and CNPG override surfaces. PSQLStack CRD replaces stackgresOperator config with namespace, kubernetesProviderConfigRef, ha, scaleToZeroPlugin, storageClass, and snapshotClass; status.ready becomes component-aware.
Core Composition Functions (Stack)
functions/stack/000-state-init.yaml.gotmpl, functions/stack/010-state-status.yaml.gotmpl, functions/stack/200-cnpg-operator.yaml.gotmpl, functions/stack/210-cnpg-scale-to-zero.yaml.gotmpl, functions/stack/220-atlas-operator.yaml.gotmpl, functions/stack/180-storageclass.yaml.gotmpl, functions/stack/170-volumesnapshotclass.yaml.gotmpl
Stack composition initializes state from CNPG/scale-to-zero/Atlas/storage configs, derives HA settings, and composes Helm releases (CNPG with HA replicaCount injection, Atlas with HA topology spread), scale-to-zero plugin deployment, and Kubernetes StorageClass/VolumeSnapshotClass objects.
Core Composition Functions (Cluster)
functions/cluster/000-state-init.yaml.gotmpl, functions/cluster/010-state-status.yaml.gotmpl, functions/cluster/200-cnpg-cluster.yaml.gotmpl, functions/cluster/100-external-secret.yaml.gotmpl
Cluster composition derives instance count (HA override), storage/postgres/app/superuser defaults, and renders CNPG Cluster manifest with bootstrap, storage, monitoring, optional credentials via ExternalSecrets, and scale-to-zero plugin configuration.
Core Composition Functions (Branch)
functions/branch/000-state-init.yaml.gotmpl, functions/branch/010-state-status.yaml.gotmpl, functions/branch/100-source-snapshot.yaml.gotmpl, functions/branch/110-branch-snapshot.yaml.gotmpl, functions/branch/200-cnpg-cluster.yaml.gotmpl, functions/branch/999-status.yaml.gotmpl
Branch composition handles same-namespace and cross-namespace forking via VolumeSnapshots, derives snapshot content for bootstrap recovery, renders CNPG Cluster with bootstrap from snapshot, and propagates scale-to-zero/TTL annotations.
Composition Registrations
apis/psqlstacks/composition.yaml, apis/psqlclusters/composition.yaml, apis/psqlbranches/composition.yaml
Register function pipelines for each XRD: stack/cluster/branch each run custom stack/cluster/branch function at step 1, then auto-ready at step 2.
Function-Based State Removal
functions/render/200-stackgres-operator.yaml.gotmpl (removed)
Deletes StackGres operator Helm release template (superseded by CNPG operator composition).
State Initialization (Stack)
functions/render/000-state-init.yaml.gotmpl
Replaces StackGres state initialization with CNPG state including namespace (cnpg-system), HA config, scale-to-zero/storage/snapshot defaults, and Kubernetes provider reference.
Examples (PSQLStack)
examples/psqlstacks/minimal.yaml, examples/psqlstacks/standard.yaml, examples/psqlstacks/local.yaml
Updated/added examples demonstrating minimal CNPG-based stack, HA production posture, and local development disable of snapshots/scale-to-zero.
Examples (PSQLCluster & PSQLBranch)
examples/psqlclusters/minimal.yaml, examples/psqlclusters/standard.yaml, examples/psqlbranches/same-namespace.yaml, examples/psqlbranches/cross-namespace.yaml, examples/psqlbranches/preview-with-ttl.yaml
New example manifests for cluster sizing/HA/credential modes and branch same-namespace/cross-namespace/TTL scenarios.
Unit Tests (Composition)
tests/test-stack/main.k, tests/test-cluster/main.k, tests/test-branch/main.k, tests/test-stack/kcl.mod, tests/test-cluster/kcl.mod
Comprehensive KCL test suites validating stack/cluster/branch composition behavior (label merging, HA injection, storage/snapshot propagation, credential modes, CNPG override, provider defaulting).
E2E Test
tests/e2etest-psql/main.k
Replaces Kubernetes-only Helm test with real AWS EKS Auto Mode provisioning, Crossplane provider setup (AWS + K8s), prerequisite stack installation (volume-snapshot, cert-manager), and live PSQLStack/Cluster/Branch XR deployment.
Build & CI/CD
Makefile, .github/workflows/on-pr.yaml, .github/workflows/on-push-main.yaml
Adds multi-API example support; updates reusable workflow references from unbounded-tech/workflows-crossplane@v2.20.0 to hops-ops/workflows-crossplane@v3.0.0, expands validate examples to JSON with per-example api_path, adds e2e timeout inputs, and updates Makefile bulk targets to derive per-example composition/definition paths.
Metadata & Documentation
upbound.yaml, README.md, .gitignore
Updates project description and dependencies (provider-kubernetes added); rewrites README from StackGres management stack to CloudNativePG platform layer with HA/snapshot/scale-to-zero composition; adds ignores for configuration YAML and test environment directories.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant XRD as XRD (PSQLStack)
    participant Stack as Stack Composition<br/>(hops-ops-psql-stackstack)
    participant CNPG as CNPG Operator<br/>(Helm Release)
    participant K8s as Kubernetes<br/>(StorageClass,<br/>VolumeSnapshot)
    participant Scale as Scale-to-Zero<br/>Plugin
    participant Atlas as Atlas Operator<br/>(Helm Release)

    User->>XRD: Create PSQLStack with HA,<br/>storage, snapshot config
    XRD->>Stack: Execute composition pipeline
    Stack->>Stack: Initialize state from spec<br/>(namespace, labels, HA, provider refs)
    Stack->>CNPG: Compose Helm Release<br/>(inject HA replicaCount)
    Stack->>K8s: Compose StorageClass Object<br/>(provisioner, parameters)
    Stack->>K8s: Compose VolumeSnapshotClass Object<br/>(driver matching storage)
    alt scaleToZeroPlugin.enabled
        Stack->>Scale: Compose plugin deployment<br/>(ServiceAccount, RBAC, Certs, Service)
    end
    Stack->>Atlas: Compose Helm Release<br/>(inject HA topology spread)
    Note over Stack,Atlas: Auto-ready waits for all<br/>enabled components ready
    CNPG-->>XRD: Release reconciles, creates<br/>cloudnative-pg chart
    K8s-->>XRD: StorageClass & VolumeSnapshotClass<br/>objects created
    Scale-->>XRD: Plugin deployment ready
    Atlas-->>XRD: Release reconciles, creates<br/>atlas-operator chart
    XRD-->>User: status.ready = true when<br/>all enabled components ready
Loading
sequenceDiagram
    actor User
    participant XRD as XRD (PSQLCluster)
    participant Comp as Cluster Composition<br/>(hops-ops-psql-stackcluster)
    participant CNPG as CNPG Cluster<br/>(Kubernetes Object)
    participant ESO as External Secrets<br/>Operator (optional)
    participant K8s as Kubernetes Secrets
    participant Monitor as CNPG<br/>Monitoring

    User->>XRD: Create PSQLCluster with<br/>storage, app, HA settings
    XRD->>Comp: Execute composition pipeline
    Comp->>Comp: Initialize state<br/>(size, version, HA instances,<br/>credential modes)
    alt app.externalSecret configured
        Comp->>ESO: Compose ExternalSecret Object<br/>(fetch username/password)
    else app.secretName provided
        Comp->>K8s: Use provided secret
    else default
        Comp->>K8s: Let CNPG generate secret
    end
    Comp->>CNPG: Compose CNPG Cluster Object<br/>(instances, bootstrap, storage,<br/>monitoring, optional superuser)
    CNPG-->>Monitor: Enable PodMonitor for metrics
    alt ha.enabled
        Note over CNPG: instances override,<br/>anti-affinity topology
    end
    alt scaleToZero.enabled
        CNPG-->>CNPG: Inject scale-to-zero plugin config
    end
    ESO-->>K8s: Populate credentials if ESO used
    CNPG-->>XRD: Cluster reconciles, spins up<br/>PostgreSQL pods
    XRD-->>User: status.ready + connection metadata<br/>(host/port/database)
Loading
sequenceDiagram
    actor User
    participant XRD as XRD (PSQLBranch)
    participant Comp as Branch Composition<br/>(hops-ops-psql-stackbranch)
    participant SrcSnap as Source VolumeSnapshot<br/>(cross-ns only)
    participant BrSnap as Branch VolumeSnapshot
    participant CNPG as CNPG Cluster<br/>(bootstrapped from fork)

    User->>XRD: Create PSQLBranch<br/>(clusterName, source, storage config)
    XRD->>Comp: Execute composition pipeline
    Comp->>Comp: Initialize state<br/>(namespace, source defaults,<br/>snapshot class, scale-to-zero)
    alt crossNamespace
        Comp->>SrcSnap: Compose source VolumeSnapshot Object<br/>(target source PVC in source namespace)
    end
    Comp->>BrSnap: Compose branch VolumeSnapshot Object<br/>(reference source PVC or snapshot content)
    Comp->>Comp: Derive observed snapshot content<br/>(track boundVolumeSnapshotContentName)
    Comp->>CNPG: Compose CNPG Cluster Object<br/>(bootstrap.recovery.volumeSnapshots,<br/>instances, storage size, labels)
    alt scaleToZero.enabled
        CNPG-->>CNPG: Add idle-timeout annotation<br/>and plugin config
    end
    alt ttl.enabled
        CNPG-->>CNPG: Add TTL expiration annotation
    end
    SrcSnap-->>BrSnap: Content available for branch fork
    BrSnap-->>CNPG: VolumeSnapshot bound,<br/>content reference set
    CNPG-->>XRD: Cluster bootstraps from snapshot,<br/>fork ready
    XRD-->>User: status.ready + phase<br/>(bootstrap complete)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A stack once called Stackgres stood tall,
Now CloudNative blooms—no walls at all!
With snapshots for forks and scale-to-zero's grace,
HA spreads wide across every place.
Hop hop hooray—the platform's complete!
🌱✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/auto-mode-pivot

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

77-89: ⚠️ Potential issue | 🟡 Minor

The Stage 3/4 examples override the wrong provider only.

helmProviderConfigRef.name: default fixes the Helm releases, but the VolumeSnapshotClass and scale-to-zero Objects still default kubernetesProviderConfigRef.name to clusterName in functions/render/000-state-init.yaml.gotmpl. As written, these examples will still look for Kubernetes ProviderConfigs named edge / local.

Suggested doc fix
 spec:
   clusterName: edge
   helmProviderConfigRef:
     name: default
+  kubernetesProviderConfigRef:
+    name: default
   snapshotClass:
     driver: driver.longhorn.io
 spec:
   clusterName: local
   helmProviderConfigRef:
     name: default
+  kubernetesProviderConfigRef:
+    name: default
   snapshotClass:
     enabled: false
   scaleToZeroPlugin:
     enabled: false

Also applies to: 95-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 77 - 89, The example PSQLStack YAML sets
helmProviderConfigRef.name: default but leaves kubernetesProviderConfigRef.name
to the clusterName (causing VolumeSnapshotClass and scale-to-zero Objects to
still use the wrong provider); update the example blocks (the PSQLStack examples
around the Stage 3/4 snippets) to explicitly set
kubernetesProviderConfigRef.name: default (or the same provider name used for
helmProviderConfigRef) so functions/render/000-state-init.yaml.gotmpl will
render the correct ProviderConfig for VolumeSnapshotClass and scale-to-zero
Objects; ensure both occurrences noted (the block at lines ~77-89 and the
similar block at ~95-109) are updated to keep provider names consistent.
🧹 Nitpick comments (1)
.gitignore (1)

14-14: Consider documenting/keeping an example config for devs/CI.

If apis/**/configuration.yaml is meant to be generated or environment-specific, it usually helps to add a committed example/template (e.g., configuration.yaml.example) and (optionally) a comment near the ignore line describing how to generate/copy it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 14, The ignore of apis/**/configuration.yaml hides
whether this is generated or required; add a committed example/template file
(e.g., configuration.yaml.example) alongside the real config for each API,
update README or CI docs to show how to copy/populate it (cp
configuration.yaml.example configuration.yaml or use env substitution), and add
a brief inline comment near the apis/**/configuration.yaml entry in .gitignore
indicating the template location and generation method so devs and CI know how
to produce the real file; reference the pattern apis/**/configuration.yaml and
ensure the new configuration.yaml.example is added to the repo for each service
that needs it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 190-191: Update the description for the ready field so it reflects
enabled components rather than implying all four are mandatory; change the text
for ready/status.ready to say something like "Overall readiness — true once all
enabled components (CNPG, Atlas, the scale-to-zero plugin if
scaleToZeroPlugin.enabled, and the VolumeSnapshotClass if snapshotClass.enabled)
are Ready" so the description references scaleToZeroPlugin.enabled and
snapshotClass.enabled and matches actual behavior.
- Around line 147-170: Update the snapshotClass description to match the schema
defaults: state that the stack composes exactly one VolumeSnapshotClass named
"psql" (not "named after the XR") and that the default CSI driver is
ebs.csi.eks.amazonaws.com (not ebs.csi.aws.com); keep a note that users can
override driver for other providers (e.g., ebs.csi.aws.com for self-managed EBS
or driver.longhorn.io) and ensure references to PSQLBranch and
VolumeSnapshotClass remain intact (look for snapshotClass, properties.name, and
properties.driver).

In `@functions/render/000-state-init.yaml.gotmpl`:
- Around line 62-68: Tests still set spec.stackgresOperator.values which this
initializer no longer reads; update the test to populate the new fields used
here ($cnpg via spec.cnpg, $s2z via spec.scaleToZeroPlugin and $s2z.enabled for
$s2zEnabled, and $atlas via spec.atlasOperator) or remove the obsolete
assignment. In the test file (tests/e2etest-psql/main.k) replace the
stackgresOperator.values block with equivalent entries under spec.cnpg,
spec.scaleToZeroPlugin (including an enabled flag if used), and
spec.atlasOperator, mapping any specific config keys from the old values into
the corresponding new sections, or delete the block if those settings are no
longer required.

In `@functions/render/170-volumesnapshotclass.yaml.gotmpl`:
- Around line 10-12: Update the comment in the volumesnapshotclass template to
reflect the correct default CSI driver: replace the stale value
"ebs.csi.aws.com" with "ebs.csi.eks.amazonaws.com" and mention that this default
originates from the state init template (which sets the default driver used by
$spec.snapshotClass.driver); ensure the comment and any example override
instructions reference "ebs.csi.eks.amazonaws.com" so readers are pointed to the
correct default.

In `@functions/render/200-cnpg-operator.yaml.gotmpl`:
- Around line 64-85: The Usage guard currently only renders when both
$state.observed.cnpg.ready and $state.observed.atlasOperator.ready are true,
which is too late; change the conditional to trigger when the CNPG and Atlas
operator are present/declared (e.g. use $state.observed.cnpg.exists or
$state.observed.cnpg.present and $state.observed.atlasOperator.exists/present or
the equivalent keys that indicate intended composition) so the Usage (name: {{
$state.name }}-delete-atlas-operator-before-cnpg, resourceRef names {{
$cnpg.name }} and {{ $state.atlasOperator.name }}) is created before readiness
is reached and prevents CNPG removal during Atlas teardown.

In `@tests/test-render/main.k`:
- Around line 282-311: The test only checks positive presence via
CompositionTest.assertResources but lacks negative checks for s2z-* and the
VolumeSnapshotClass; update the CompositionTest block for the
"scale-to-zero-plugin-can-be-disabled" (and the other similar case) to
explicitly assert absence by using the test framework's absence check (e.g., add
assertResourcesAbsent with entries for resources matching kind/name patterns
"s2z-*" and the VolumeSnapshotClass or add an explicit rendered resource
count/assertion like renderedResourceCount == expected) so the test fails if
those s2z-* objects or VolumeSnapshotClass are emitted; locate the
CompositionTest blocks (metadata.name = "scale-to-zero-plugin-can-be-disabled"
and the similar test) and add the negative assertions alongside assertResources.

In `@upbound.yaml`:
- Around line 19-21: Update the metadata "description" and any README text to
remove references to an optional Karpenter NodePool and instead state that this
stack deploys CloudNativePG, the cnpg-i-scale-to-zero plugin, and the Atlas
Operator via Helm releases plus applied manifests, and optionally configures a
VolumeSnapshotClass; ensure the wording matches the rendered template
functions/render/210-cnpg-scale-to-zero.yaml.gotmpl and the XR shape in
apis/psqlstacks/definition.yaml. Locate and replace the stale phrase in the
description field and the README block (also update the repeated text around
lines 27-35) so the file consistently mentions Helm + applied manifests +
optional VolumeSnapshotClass and omits any NodePool language.

---

Outside diff comments:
In `@README.md`:
- Around line 77-89: The example PSQLStack YAML sets helmProviderConfigRef.name:
default but leaves kubernetesProviderConfigRef.name to the clusterName (causing
VolumeSnapshotClass and scale-to-zero Objects to still use the wrong provider);
update the example blocks (the PSQLStack examples around the Stage 3/4 snippets)
to explicitly set kubernetesProviderConfigRef.name: default (or the same
provider name used for helmProviderConfigRef) so
functions/render/000-state-init.yaml.gotmpl will render the correct
ProviderConfig for VolumeSnapshotClass and scale-to-zero Objects; ensure both
occurrences noted (the block at lines ~77-89 and the similar block at ~95-109)
are updated to keep provider names consistent.

---

Nitpick comments:
In @.gitignore:
- Line 14: The ignore of apis/**/configuration.yaml hides whether this is
generated or required; add a committed example/template file (e.g.,
configuration.yaml.example) alongside the real config for each API, update
README or CI docs to show how to copy/populate it (cp configuration.yaml.example
configuration.yaml or use env substitution), and add a brief inline comment near
the apis/**/configuration.yaml entry in .gitignore indicating the template
location and generation method so devs and CI know how to produce the real file;
reference the pattern apis/**/configuration.yaml and ensure the new
configuration.yaml.example is added to the repo for each service that needs it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 087a343f-4577-4bb1-9082-5215a0bdc6fc

📥 Commits

Reviewing files that changed from the base of the PR and between 495fa03 and b7a522f.

📒 Files selected for processing (15)
  • .gitignore
  • README.md
  • apis/psqlstacks/definition.yaml
  • examples/psqlstacks/local.yaml
  • examples/psqlstacks/minimal.yaml
  • examples/psqlstacks/standard.yaml
  • functions/render/000-state-init.yaml.gotmpl
  • functions/render/010-state-status.yaml.gotmpl
  • functions/render/170-volumesnapshotclass.yaml.gotmpl
  • functions/render/200-cnpg-operator.yaml.gotmpl
  • functions/render/200-stackgres-operator.yaml.gotmpl
  • functions/render/210-cnpg-scale-to-zero.yaml.gotmpl
  • functions/render/220-atlas-operator.yaml.gotmpl
  • tests/test-render/main.k
  • upbound.yaml
💤 Files with no reviewable changes (1)
  • functions/render/200-stackgres-operator.yaml.gotmpl

Comment thread apis/psqlstacks/definition.yaml
Comment thread apis/psqlstacks/definition.yaml Outdated
Comment thread functions/stack/000-state-init.yaml.gotmpl
Comment thread functions/stack/170-volumesnapshotclass.yaml.gotmpl Outdated
Comment thread functions/stack/200-cnpg-operator.yaml.gotmpl Outdated
Comment thread tests/test-stack/main.k
Comment thread upbound.yaml
Merges PSQLCluster and PSQLBranch XRDs (previously standalone repos under
hops-ops/psql-cluster and hops-ops/psql-branch) into this package. One
Configuration package, one release cadence, one e2e flow.

Changes:
- apis/{psqlclusters,psqlbranches}/ — XRDs and compositions copied in
- examples/{psqlclusters,psqlbranches}/ — example manifests copied in
- functions/render/ → functions/stack/ — renamed to make room for siblings
- functions/{cluster,branch}/ — new function packages, gotmpls copied from
  the standalone repos. Composition functionRefs updated:
    psqlstacks    → hops-ops-psql-stackstack
    psqlclusters  → hops-ops-psql-stackcluster
    psqlbranches  → hops-ops-psql-stackbranch
- tests/test-{stack,cluster,branch}/ — render tests renamed (was test-render-*)
- tests/e2etest-psql/main.k — unified e2e covering all three XRs at Synced;
  TODO upgrade to Ready integration after volume-snapshot-stack v0.1.0
- .github/workflows/on-pr.yaml + on-push-main.yaml — switched to multi-API
  workflow signature, pinned at @feat/multi-api-support for testing
- Makefile — EXAMPLES list extended; render/validate logic still single-API
  (follow-up to make per-example api_path work locally)

Workflow change being tested:
  unbounded-tech/workflows-crossplane@feat/multi-api-support
  (validate.yaml now resolves api_path per example with fallback to inputs.api_path)
volume-snapshot-stack v0.1.0 is now published, so we can install it as a
dependency Configuration package and bring snapshot-controller into the
test cluster. With that, the whole chain reconciles to Ready in-cluster:

  1. VolumeSnapshotStack XR (in extraResources) → snapshot-controller live
  2. PSQLStack manifest → Helm-installs CNPG + atlas-operator + the psql
     VolumeSnapshotClass
  3. PSQLCluster manifest → CNPG bootstraps a real Postgres with a real PVC
  4. PSQLBranch manifest → snapshots the source PVC, restores into a new
     CNPG cluster

Pattern mirrors the aws-observe-stack e2e (initResources for dependency
Configuration packages, extraResources for the dependent XRs).

defaultConditions: Synced → Ready
timeoutSeconds: 1800 → 5400 (90 min for the full chain)
cleanupTimeoutSeconds: 900 → 1800
The unified e2e was carrying over `stackgresOperator.values` and
`atlasOperator.values` from before the CNPG pivot. The current
PSQLStack XRD uses `cnpg` (not `stackgresOperator`), `namespace`
defaults to `cnpg-system` (not `stackgres`), and the kind-cluster
defaults work without any operator-values overrides.

Match `local/psqlstack.yaml`'s minimal shape: clusterName + labels +
ProviderConfig refs only. Add `kubernetesProviderConfigRef` since
PSQLStack now also applies the VolumeSnapshotClass via the kubernetes
provider.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

♻️ Duplicate comments (2)
functions/stack/170-volumesnapshotclass.yaml.gotmpl (1)

10-12: ⚠️ Potential issue | 🟡 Minor

Update the stale default CSI driver comment.

Line 10 documents ebs.csi.aws.com, but the stack default is ebs.csi.eks.amazonaws.com. This will mislead operators about expected defaults.

Suggested fix
-# Default driver: ebs.csi.aws.com (correct for EKS Auto Mode out of the box).
+# Default driver: ebs.csi.eks.amazonaws.com (EKS Auto Mode default in state init).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@functions/stack/170-volumesnapshotclass.yaml.gotmpl` around lines 10 - 12,
Update the stale comment that names the default CSI driver: change the
documented driver string from "ebs.csi.aws.com" to the current stack default
"ebs.csi.eks.amazonaws.com" in the VolumeSnapshotClass template so it reflects
the actual default; locate the comment near the spec.snapshotClass.driver
reference in the VolumeSnapshotClass/gotmpl block and replace the old driver
name with the new one.
functions/stack/200-cnpg-operator.yaml.gotmpl (1)

64-85: ⚠️ Potential issue | 🟠 Major

Render deletion guard on presence, not readiness.

Line 64 delays Usage creation until both releases are Ready. If Atlas exists but is not Ready, teardown protection may never activate when needed.

Suggested fix
-{{- if and $state.observed.cnpg.ready $state.observed.atlasOperator.ready }}
+{{- $observed := $.observed.resources | default dict }}
+{{- if and (hasKey $observed "cnpg-operator") (hasKey $observed "atlas-operator") }}
 ---
 apiVersion: protection.crossplane.io/v1beta1
 kind: Usage
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@functions/stack/200-cnpg-operator.yaml.gotmpl` around lines 64 - 85, The
Usage resource guard is gated on readiness ($state.observed.cnpg.ready and
$state.observed.atlasOperator.ready) so teardown protection can be skipped if
Atlas exists but isn't Ready; change the template condition to check
presence/existence instead of readiness (e.g., replace checks of
$state.observed.cnpg.ready and $state.observed.atlasOperator.ready with their
existence/presence flags such as $state.observed.cnpg.exists and
$state.observed.atlasOperator.exists or equivalent) so the Usage ({{ $state.name
}}-delete-atlas-operator-before-cnpg) is rendered whenever both CNPG and the
Atlas operator are present regardless of Ready state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/on-pr.yaml:
- Around line 30-53: The workflow is referencing mutable branch refs (uses:
unbounded-tech/workflows-crossplane/...@feat/multi-api-support) for the reusable
workflows validate, test, e2e and publish; replace those branch refs with an
immutable tag or commit SHA (e.g., `@vX.Y.Z` or a specific commit) for each uses
entry so the CI points to a fixed release—update the four uses lines
(validate.yaml, test.yaml, e2e.yaml, publish.yaml) to the chosen semantic
version tag or commit SHA consistent with the existing workflow-vnext-tag
pattern.

In @.github/workflows/on-push-main.yaml:
- Around line 26-42: The workflow currently references mutable branch refs for
the reusable workflows in the validate, test, and e2e jobs (the lines using
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, and e2e.yaml@feat/multi-api-support); update
those refs to an immutable release tag (for example replace
`@feat/multi-api-support` with a specific tag like `@v1.21.0` or another pinned tag)
so validate, test, and e2e always run the exact released workflow code.

In `@apis/psqlbranches/definition.yaml`:
- Around line 117-123: Remove the hard-coded default for branch Postgres version
by deleting the default: "17" entry under the postgresql -> properties ->
version schema in definition.yaml so an omitted version is not coerced to "17";
ensure the schema allows absence (do not add a fixed default) so callers can
distinguish "inherit from source" vs "explicitly set", and if needed mark the
version property nullable or optional instead of supplying a default.

In `@apis/psqlclusters/composition.yaml`:
- Around line 11-16: The composition references custom functions
hops-ops-psql-stackcluster, hops-ops-psql-stackstack, and
hops-ops-psql-stackbranch but they are not declared as dependencies in
upbound.yaml; update upbound.yaml's dependsOn section to include entries for
these three functions (using their package names and versions) alongside the
existing crossplane-contrib-function-auto-ready, or alternatively
document/ensure those three functions are preinstalled before deploying—make
sure to add exact package identifiers for hops-ops-psql-stackcluster,
hops-ops-psql-stackstack, and hops-ops-psql-stackbranch so the composition can
resolve the functionRef names.

In `@apis/psqlclusters/definition.yaml`:
- Around line 154-161: The OpenAPI schema exposes monitoring.enabled but the
cluster compose template is reading $monSpec.monitoring instead of the boolean
$monSpec.enabled, causing explicit false to be ignored; update the template in
functions/cluster/000-state-init.yaml.gotmpl so that the Cluster CR's monitoring
section uses $monSpec.enabled (or conditionally sets monitoring.enabled to
$monSpec.enabled) rather than copying the whole $monSpec.monitoring object,
ensuring explicit false values disable PodMonitor creation as intended.

In `@apis/psqlstacks/composition.yaml`:
- Around line 11-16: The composition references a Function named
hops-ops-psql-stackstack but that Function is missing from the project
dependency manifest; add a Function resource entry named
hops-ops-psql-stackstack into your upbound.yaml (or the project’s Functions
list) so the name exactly matches the functionRef in the composition, providing
the required metadata and spec (image/source, runtime, and any config) that
Crossplane expects for that Function; ensure the Function resource kind/name is
hops-ops-psql-stackstack and then rebuild/update the package so the Function is
available at runtime.

In `@functions/branch/010-state-status.yaml.gotmpl`:
- Around line 19-27: The template currently stops fallback when a
"branch-snapshot" exists even if its boundVolumeSnapshotContentName is empty,
causing $snapContent to be cleared; update the selection logic so you prefer the
branch-snapshot only when it actually contains a non-empty
boundVolumeSnapshotContentName and otherwise fall back to source-snapshot.
Concretely, compute both get $observed "branch-snapshot" and get $observed
"source-snapshot" into $snapEntryBranch and $snapEntrySource, extract their
boundVolumeSnapshotContentName via the existing pipeline ($snapResource →
$snapAtProvider → $snapManifest → $snapStatus → boundVolumeSnapshotContentName),
and set $snapEntry/$snapContent to the branch one only if that
boundVolumeSnapshotContentName is non-empty; otherwise use the source-snapshot
values so $snapContent never gets overwritten with an empty string.

In `@functions/branch/100-source-snapshot.yaml.gotmpl`:
- Around line 16-43: The VolumeSnapshot metadata name currently uses "{{
$state.name }}-src" which can collide across namespaces; change the snapshot
name generation in the template so it includes a namespace-specific component
(for example incorporate {{ $source.namespace }} and/or {{ $source.pvcName }} or
a short hash of them) instead of just {{ $state.name }}, updating the
metadata.name line inside the forProvider.manifest block (the VolumeSnapshot
metadata) and any related references (e.g. resource name labels/annotations) so
the snapshot is unique per source namespace/PVC.

In `@functions/branch/200-cnpg-cluster.yaml.gotmpl`:
- Around line 78-85: The template always sets storage.size from
$state.branch.storage.size which causes branches to default to 10Gi even when
the source PVC is larger; change the assignment so size falls back to the source
PVC when branch size is empty (e.g. compute $size := default
$state.source.storage.size $state.branch.storage.size or equivalent) and use
that $size when building $storage, keeping the conditional storageClass logic
and assigning to $clusterSpec "storage".

In `@functions/cluster/200-cnpg-cluster.yaml.gotmpl`:
- Around line 67-72: The template currently places the superuser secret into
bootstrap.initdb.secret (the block with "bootstrap" "initdb" "owner" "app" and
secret name $state.credentials.superuser.secretName); instead, wire the
superuser credential via spec.superuserSecret using
$state.credentials.superuser.secretName and make bootstrap.initdb.secret point
to the application-owner secret (a secret whose username matches owner: app).
Update the template to set spec.superuserSecret to
$state.credentials.superuser.secretName and ensure the "bootstrap" -> "initdb"
-> "secret" uses the app-owner secret (not the postgres superuser secret).

In `@functions/stack/000-state-init.yaml.gotmpl`:
- Around line 71-85: The comment above the VolumeSnapshotClass block is out of
sync with the template default driver; update the human-readable note that
currently says "ebs.csi.aws.com" to match the rendered default
"ebs.csi.eks.amazonaws.com" so docs match the actual fallback defined by
$snapshotSpec and the $snapshotClass dict (see variables $snapshotSpec,
$snapshotEnabled and $snapshotClass where driver defaults to
"ebs.csi.eks.amazonaws.com").

In `@Makefile`:
- Around line 25-31: The new examples in EXAMPLES (psqlclusters and
psqlbranches) are still being processed through the existing
$(DEFINITION)/$(COMPOSITION) paths for apis/psqlstacks, so the derived
XRD/composition will be generated against the wrong schema; update the Makefile
rules that consume $(EXAMPLES) and produce $(DEFINITION)/$(COMPOSITION) to apply
the api-dir macro (api-dir) when computing the api_path (e.g., use the api-dir
expansion or $(call api-dir,...) in the pattern-substitution used by the targets
that refer to DEFINITION and COMPOSITION) so each example maps to its
corresponding apis/<x>/... paths instead of always apis/psqlstacks.

In `@tests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.k`:
- Around line 95-120: The APIDependencies union schema
MetaDevUpboundIoV2alpha1ProjectSpecAPIDependenciesItems0 is inconsistent: it
exposes git, http, and k8s payload fields but the discriminator $type only
allows "k8s" | "crd" and there is no crd payload type defined. Fix by aligning
the discriminator with the payloads — either add a `crd` payload schema and
implement its field type if "crd" is intended, or expand/replace `$type` to
include the actual variants ("git", "http", "k8s") and ensure only the matching
payload field (git/http/k8s) is present for each discriminator value; update
MetaDevUpboundIoV2alpha1ProjectSpecAPIDependenciesItems0 accordingly.

In `@tests/test-stack/main.k`:
- Around line 298-311: The test's assertResources list is missing an assertion
for the Atlas Release despite the test description claiming "cnpg + atlas + VSC
still composed"; add an entry to the assertResources array similar to the
existing CNPG Release assertion that checks for the Atlas Release (apiVersion
"helm.m.crossplane.io/v1beta1", kind "Release", metadata.name set to the Atlas
release name used elsewhere in tests, e.g., "atlas"), so the test will fail if
Atlas is not composed.

---

Duplicate comments:
In `@functions/stack/170-volumesnapshotclass.yaml.gotmpl`:
- Around line 10-12: Update the stale comment that names the default CSI driver:
change the documented driver string from "ebs.csi.aws.com" to the current stack
default "ebs.csi.eks.amazonaws.com" in the VolumeSnapshotClass template so it
reflects the actual default; locate the comment near the
spec.snapshotClass.driver reference in the VolumeSnapshotClass/gotmpl block and
replace the old driver name with the new one.

In `@functions/stack/200-cnpg-operator.yaml.gotmpl`:
- Around line 64-85: The Usage resource guard is gated on readiness
($state.observed.cnpg.ready and $state.observed.atlasOperator.ready) so teardown
protection can be skipped if Atlas exists but isn't Ready; change the template
condition to check presence/existence instead of readiness (e.g., replace checks
of $state.observed.cnpg.ready and $state.observed.atlasOperator.ready with their
existence/presence flags such as $state.observed.cnpg.exists and
$state.observed.atlasOperator.exists or equivalent) so the Usage ({{ $state.name
}}-delete-atlas-operator-before-cnpg) is rendered whenever both CNPG and the
Atlas operator are present regardless of Ready state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1fbd16ce-b73a-4e47-bd02-3efa17c8d181

📥 Commits

Reviewing files that changed from the base of the PR and between b7a522f and cdb8d5d.

📒 Files selected for processing (57)
  • .github/workflows/on-pr.yaml
  • .github/workflows/on-push-main.yaml
  • Makefile
  • apis/psqlbranches/composition.yaml
  • apis/psqlbranches/definition.yaml
  • apis/psqlclusters/composition.yaml
  • apis/psqlclusters/definition.yaml
  • apis/psqlstacks/composition.yaml
  • examples/psqlbranches/cross-namespace.yaml
  • examples/psqlbranches/preview-with-ttl.yaml
  • examples/psqlbranches/same-namespace.yaml
  • examples/psqlclusters/minimal.yaml
  • examples/psqlclusters/standard.yaml
  • functions/branch/000-state-init.yaml.gotmpl
  • functions/branch/010-state-status.yaml.gotmpl
  • functions/branch/100-source-snapshot.yaml.gotmpl
  • functions/branch/110-branch-snapshot.yaml.gotmpl
  • functions/branch/200-cnpg-cluster.yaml.gotmpl
  • functions/branch/999-status.yaml.gotmpl
  • functions/cluster/000-state-init.yaml.gotmpl
  • functions/cluster/010-state-status.yaml.gotmpl
  • functions/cluster/100-external-secret.yaml.gotmpl
  • functions/cluster/200-cnpg-cluster.yaml.gotmpl
  • functions/cluster/999-status.yaml.gotmpl
  • functions/stack/000-state-init.yaml.gotmpl
  • functions/stack/010-state-status.yaml.gotmpl
  • functions/stack/170-volumesnapshotclass.yaml.gotmpl
  • functions/stack/200-cnpg-operator.yaml.gotmpl
  • functions/stack/210-cnpg-scale-to-zero.yaml.gotmpl
  • functions/stack/220-atlas-operator.yaml.gotmpl
  • functions/stack/999-status.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • tests/test-branch/kcl.mod
  • tests/test-branch/main.k
  • tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlbranch.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/e2etest.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/operationtest.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/project.k
  • tests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.k
  • tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
  • tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
  • tests/test-branch/model/kcl.mod
  • tests/test-cluster/kcl.mod
  • tests/test-cluster/main.k
  • tests/test-cluster/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/e2etest.k
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/operationtest.k
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/project.k
  • tests/test-cluster/model/io/upbound/dev/meta/v2alpha1/project.k
  • tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
  • tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
  • tests/test-cluster/model/kcl.mod
  • tests/test-stack/kcl.mod
  • tests/test-stack/main.k
  • tests/test-stack/model
✅ Files skipped from review due to trivial changes (27)
  • tests/test-cluster/model/kcl.mod
  • tests/test-branch/model/kcl.mod
  • examples/psqlclusters/minimal.yaml
  • tests/test-stack/kcl.mod
  • tests/test-cluster/kcl.mod
  • tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
  • examples/psqlclusters/standard.yaml
  • functions/branch/999-status.yaml.gotmpl
  • examples/psqlbranches/preview-with-ttl.yaml
  • tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/owner_reference.k
  • apis/psqlbranches/composition.yaml
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
  • functions/cluster/999-status.yaml.gotmpl
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/operationtest.k
  • tests/test-branch/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
  • tests/test-cluster/model/k8s/apimachinery/pkg/apis/meta/v1/object_meta.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/compositiontest.k
  • functions/branch/000-state-init.yaml.gotmpl
  • examples/psqlbranches/same-namespace.yaml
  • tests/test-cluster/model/io/upbound/dev/meta/v2alpha1/project.k
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/e2etest.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/e2etest.k
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/operationtest.k
  • tests/test-branch/model/io/upbound/dev/meta/v1alpha1/project.k
  • examples/psqlbranches/cross-namespace.yaml
  • tests/test-cluster/model/io/upbound/dev/meta/v1alpha1/project.k
  • tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlbranch.k

Comment thread .github/workflows/on-pr.yaml Outdated
Comment on lines +26 to +42
uses: unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support
with:
examples: |
[
{ "example": "examples/psqlstacks/minimal.yaml" },
{ "example": "examples/psqlstacks/standard.yaml" }
{ "example": "examples/psqlstacks/minimal.yaml", "api_path": "apis/psqlstacks" },
{ "example": "examples/psqlstacks/standard.yaml", "api_path": "apis/psqlstacks" },
{ "example": "examples/psqlclusters/minimal.yaml", "api_path": "apis/psqlclusters" },
{ "example": "examples/psqlclusters/standard.yaml","api_path": "apis/psqlclusters" },
{ "example": "examples/psqlbranches/same-namespace.yaml", "api_path": "apis/psqlbranches" }
]
api_path: apis/psqlstacks
error_on_missing_schemas: true

test:
uses: unbounded-tech/workflows-crossplane/.github/workflows/test.yaml@v2.20.0
uses: unbounded-tech/workflows-crossplane/.github/workflows/test.yaml@feat/multi-api-support

e2e:
uses: unbounded-tech/workflows-crossplane/.github/workflows/e2e.yaml@v2.20.0
uses: unbounded-tech/workflows-crossplane/.github/workflows/e2e.yaml@feat/multi-api-support
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/on-push-main.yaml | head -50

Repository: hops-ops/psql-stack

Length of output: 1893


🏁 Script executed:

cat -n .github/workflows/on-push-main.yaml

Repository: hops-ops/psql-stack

Length of output: 2013


Pin reusable workflows to immutable refs before merging.

The validate, test, and e2e jobs reference @feat/multi-api-support, a mutable branch that can change without review. This breaks mainline CI reproducibility—different runs may execute different code. Pin these to immutable release tags (like the v1.21.0 used for version-and-tag).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/on-push-main.yaml around lines 26 - 42, The workflow
currently references mutable branch refs for the reusable workflows in the
validate, test, and e2e jobs (the lines using
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, and e2e.yaml@feat/multi-api-support); update
those refs to an immutable release tag (for example replace
`@feat/multi-api-support` with a specific tag like `@v1.21.0` or another pinned tag)
so validate, test, and e2e always run the exact released workflow code.

Comment thread apis/psqlbranches/definition.yaml Outdated
Comment thread apis/psqlclusters/composition.yaml
Comment on lines +154 to +161
monitoring:
description: Add Prometheus scrape configuration. CNPG operator handles the actual PodMonitor creation when `monitoring.enablePodMonitor` is set on the Cluster CR.
type: object
properties:
enabled:
type: boolean
default: true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

monitoring.enabled is exposed but not wired through.

The schema advertises a toggle here, but functions/cluster/000-state-init.yaml.gotmpl:72-76 currently reads $monSpec.monitoring instead of $monSpec.enabled, so explicit false values are ignored and the composed Cluster stays monitored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/psqlclusters/definition.yaml` around lines 154 - 161, The OpenAPI schema
exposes monitoring.enabled but the cluster compose template is reading
$monSpec.monitoring instead of the boolean $monSpec.enabled, causing explicit
false to be ignored; update the template in
functions/cluster/000-state-init.yaml.gotmpl so that the Cluster CR's monitoring
section uses $monSpec.enabled (or conditionally sets monitoring.enabled to
$monSpec.enabled) rather than copying the whole $monSpec.monitoring object,
ensuring explicit false values disable PodMonitor creation as intended.

Comment thread functions/cluster/200-cnpg-cluster.yaml.gotmpl Outdated
Comment thread functions/stack/000-state-init.yaml.gotmpl Outdated
Comment thread Makefile
Comment thread tests/test-branch/model/io/upbound/dev/meta/v2alpha1/project.k Outdated
Comment thread tests/test-stack/main.k Outdated
Three locations claimed the default driver was `ebs.csi.aws.com` and the
VolumeSnapshotClass was "named after the XR" — both wrong relative to the
actual schema (driver `ebs.csi.eks.amazonaws.com`, name defaults to `psql`).
This text shows up in `kubectl explain`, generated docs, and template
comments, so it needed to match.

Also clarified `status.ready` description: components are toggleable, so
readiness is "every enabled component is Ready" rather than implying all
four are mandatory.

From CodeRabbit review on PR #10.
The s2z-disabled test description claimed cnpg + atlas + VSC are still
composed, but only cnpg and the VSC were asserted — a regression that
broke the Atlas Release would silently pass. Added the Atlas assertion.

From CodeRabbit review on PR #10.
… not readiness

The Usage that protects cnpg-operator from premature deletion (so Atlas is
torn down first and CNPG isn't yanked while Atlas's migration state is
still live) was rendered only after both Releases reported Ready. That
left a window: if Atlas was mid-progressing or in error and the user
deleted the stack, no Usage existed yet, and CNPG could be deleted first
— exactly the ordering this guard exists to prevent.

Switched the gate from `$state.observed.{cnpg,atlasOperator}.ready` to
`hasKey $.observed.resources "{cnpg,atlas}-operator"`. The Usage now
appears as soon as both Releases are observed, regardless of their
readiness state.

Verified locally: render tests still pass; cluster reinstall via
`hops config install --path` brings all three XRs back to Synced/Ready.

From CodeRabbit review on PR #10.
…ed externalSecret

The previous `credentials.superuser` shape was misleading: the secret named
"superuser" was actually wired into CNPG's `bootstrap.initdb.secret`, which
takes the *application user's* credentials. The actual postgres superuser
secret was either auto-generated by CNPG or absent from the spec entirely.
And it collided with CNPG's own `<cluster>-superuser` secret naming.

New shape (no backwards compat — still alpha):

  spec.app:                     # always present; wires bootstrap.initdb
    role: app                   # Postgres role name
    database: app               # Application database name
    secretName: ""              # K8s Secret; default <cluster>-app
    externalSecret:             # OPTIONAL — when set, ESO renders the Secret
      secretStore:
        kind: ClusterSecretStore | SecretStore
        name: hops-aws-secrets-manager
        namespace: ""           # for SecretStore (defaults to XR ns)
      secretRef:
        path: my-cluster/app    # remote location; JSON value with username+password

  spec.superuser:               # OPTIONAL — omit to let CNPG auto-generate
    secretName: ""              # default <cluster>-superuser
    externalSecret: { ... }     # same shape as app.externalSecret

When `superuser` is set, the composition renders `spec.superuserSecret` on
the wrapped CNPG Cluster CR; otherwise CNPG auto-generates and stores the
secret at `<cluster>-superuser` (its own convention).

Field names mirror External Secrets Operator's CRD shape so anyone familiar
with ESO can read it at a glance.

Status now exposes connection details so dependent XRs can wire without
hardcoding:
  status.app: { secretName, database, host, port }
  status.superuser: { secretName }

Render template factored to a single `psqlcluster.externalSecret` definition
so the app/superuser ExternalSecrets share one source of truth.

Tests rewritten:
- Test 1 ("minimal-renders-cluster-only") asserts default mode renders only
  the Cluster — no ExternalSecret, since externalSecret is now opt-in.
- Test 7 ("external-secret-renders-when-opted-in") explicitly asserts the
  ESO ExternalSecret with the new shape.

Migration for existing manifests: `credentials.superuser.managedBy: ""`
→ remove the block (BYO is now default). `credentials.superuser` with ESO
fields → move under `spec.app.externalSecret` matching the new shape.

From CodeRabbit review on PR #10.
… repos

The render:all and validate:all targets were hardcoded to apis/psqlstacks
via $(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), so multi-API examples
(psqlclusters/*, psqlbranches/*) were rendered against the wrong schema.

Each example now resolves its own apis/<plural>/ dir from the example path
(`examples/<plural>/<file>.yaml` → `apis/<plural>`) and uses that for both
`up composition render --xrd=...` and `crossplane beta validate <api_dir>`.

Verified locally: `make render` and `make validate` both clean across all
five examples.

From CodeRabbit review on PR #10.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
.github/workflows/on-push-main.yaml (1)

26-42: ⚠️ Potential issue | 🟠 Major

Pin the reusable workflows to immutable refs.

validate, test, and e2e still pull workflow code from @feat/multi-api-support, so a later branch push can change main CI behavior without a repo change here. Please switch these to a tag or commit SHA before merging.

#!/bin/bash
set -euo pipefail
rg -n 'uses:\s+unbounded-tech/workflows-crossplane/.+@' .github/workflows/on-push-main.yaml .github/workflows/on-pr.yaml .github/workflows/on-version-tagged.yaml

Expected result: on-push-main.yaml and on-pr.yaml should reference immutable tags/SHAs rather than branch names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/on-push-main.yaml around lines 26 - 42, The workflow
currently references reusable workflows by branch refs
(validate.yaml@feat/multi-api-support, test.yaml@feat/multi-api-support,
e2e.yaml@feat/multi-api-support); update each "uses:
unbounded-tech/workflows-crossplane/.github/workflows/{validate.yaml,test.yaml,e2e.yaml}@feat/multi-api-support"
to point to an immutable tag or commit SHA (e.g., replace
"@feat/multi-api-support" with a specific release tag or the exact commit SHA)
so validate, test, and e2e always use pinned workflow code.
🧹 Nitpick comments (1)
functions/cluster/000-state-init.yaml.gotmpl (1)

63-67: Add defensive defaulting for spec.storage to match the pattern used elsewhere in the codebase.

While spec.storage is schema-required for PSQLCluster, defensive defaulting aligns with the pattern already established in the PSQLBranch template (functions/branch/000-state-init.yaml.gotmpl line 65) and provides resilience against edge cases.

Suggested fix
-{{- $storageSpec := $spec.storage }}
+{{- $storageSpec := $spec.storage | default dict }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@functions/cluster/000-state-init.yaml.gotmpl` around lines 63 - 67, The
template accesses $spec.storage directly which can fail in edge cases; add
defensive defaulting before using it (mirror the approach in the PSQLBranch
template) by setting $storageSpec to a default empty dict when $spec.storage is
missing, then build $storage from $storageSpec (refer to the symbols
$spec.storage, $storageSpec and $storage in the diff and the PSQLBranch template
for the exact pattern).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@functions/cluster/100-external-secret.yaml.gotmpl`:
- Around line 64-91: The ExternalSecret resource names/annotations in the
psqlcluster.externalSecret template are using "external-secret-app" and
"external-secret-superuser" which don't match the status lookup expecting
"external-secret"; update the ResourceName and ResourceNameAnnotation entries
passed into template "psqlcluster.externalSecret" for both the app and superuser
blocks to use the singular "external-secret" (e.g., ResourceName: printf
"%s-external-secret" and ResourceNameAnnotation: setResourceNameAnnotation
"external-secret") so the status readiness check can find the synced resources;
alternatively, if you prefer multiple distinct names, update the status lookup
to match these specific keys (ensure changes are made for both app and superuser
usages).

In `@functions/cluster/200-cnpg-cluster.yaml.gotmpl`:
- Around line 24-28: The current use of merge "$state.labels (dict)" aliases
$state.labels as the destination so later set calls on $clusterLabels mutate
$state.labels; change the merge to create a fresh map as the destination (e.g.,
use merge (dict) $state.labels) so $clusterLabels is a copy and subsequent set
calls (the $_ := set $clusterLabels "...") do not mutate $state.labels and
therefore prevent branching labels from leaking elsewhere.

In `@Makefile`:
- Around line 113-117: The validate:% recipe uses undefined shell variables
$$definition, $$composition, and $$api_dir; initialize them at the start of the
recipe using the pattern stem (same way example is set) so the render/validate
commands get real paths. For example, set shell vars like
definition="examples/psqlstacks/$*.yaml", composition="compositions/$*.yaml" and
api_dir="apis" (or your repo's actual api directory) before calling up
composition render in the validate:% target so $$definition, $$composition, and
$$api_dir are non-empty when used.

In `@tests/e2etest-psql/main.k`:
- Around line 188-190: Remove the stale assignment to app.managedBy (it no
longer exists on PSQLCluster.spec.app) — delete the line setting app.managedBy
and do not add any replacement; to skip ExternalSecret/BYO mode simply omit the
app.externalSecret block entirely (the schema defaults to skipping it). Update
any nearby comments to reflect that omission and ensure no other code references
app.managedBy.

In `@tests/test-stack/main.k`:
- Around line 14-16: Update the docstring comments that still mention the old
snapshot-class driver "ebs.csi.aws.com" to the current driver used by this stack
(replace all occurrences of "ebs.csi.aws.com" in the comments near the
VolumeSnapshotClass description), keeping the rest of the wording (e.g.,
"VolumeSnapshotClass", default name "psql") intact; also apply the same
replacement to the other comment occurrence referenced in the file so both
comment blocks reflect the current snapshot driver.

---

Duplicate comments:
In @.github/workflows/on-push-main.yaml:
- Around line 26-42: The workflow currently references reusable workflows by
branch refs (validate.yaml@feat/multi-api-support,
test.yaml@feat/multi-api-support, e2e.yaml@feat/multi-api-support); update each
"uses:
unbounded-tech/workflows-crossplane/.github/workflows/{validate.yaml,test.yaml,e2e.yaml}@feat/multi-api-support"
to point to an immutable tag or commit SHA (e.g., replace
"@feat/multi-api-support" with a specific release tag or the exact commit SHA)
so validate, test, and e2e always use pinned workflow code.

---

Nitpick comments:
In `@functions/cluster/000-state-init.yaml.gotmpl`:
- Around line 63-67: The template accesses $spec.storage directly which can fail
in edge cases; add defensive defaulting before using it (mirror the approach in
the PSQLBranch template) by setting $storageSpec to a default empty dict when
$spec.storage is missing, then build $storage from $storageSpec (refer to the
symbols $spec.storage, $storageSpec and $storage in the diff and the PSQLBranch
template for the exact pattern).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf8788e5-52c9-443c-a955-4bcce34be39e

📥 Commits

Reviewing files that changed from the base of the PR and between cdb8d5d and 5d8a092.

📒 Files selected for processing (17)
  • .github/workflows/on-pr.yaml
  • .github/workflows/on-push-main.yaml
  • Makefile
  • apis/psqlclusters/definition.yaml
  • apis/psqlstacks/definition.yaml
  • functions/cluster/000-state-init.yaml.gotmpl
  • functions/cluster/100-external-secret.yaml.gotmpl
  • functions/cluster/200-cnpg-cluster.yaml.gotmpl
  • functions/cluster/999-status.yaml.gotmpl
  • functions/stack/000-state-init.yaml.gotmpl
  • functions/stack/170-volumesnapshotclass.yaml.gotmpl
  • functions/stack/200-cnpg-operator.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
  • tests/test-cluster/main.k
  • tests/test-cluster/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
  • tests/test-stack/main.k
✅ Files skipped from review due to trivial changes (4)
  • functions/stack/170-volumesnapshotclass.yaml.gotmpl
  • tests/test-branch/model/ai/com/ops/hops/v1alpha1/psqlcluster.k
  • functions/cluster/999-status.yaml.gotmpl
  • functions/stack/000-state-init.yaml.gotmpl
🚧 Files skipped from review as they are similar to previous changes (2)
  • functions/stack/200-cnpg-operator.yaml.gotmpl
  • .github/workflows/on-pr.yaml

Comment thread functions/cluster/100-external-secret.yaml.gotmpl
Comment thread functions/cluster/200-cnpg-cluster.yaml.gotmpl
Comment thread Makefile Outdated
Comment thread tests/e2etest-psql/main.k Outdated
Comment thread tests/test-stack/main.k Outdated
Adds a third credential mode: omit `app.externalSecret` AND `app.secretName`
and the composition omits `bootstrap.initdb.secret` from the CNPG Cluster CR
so CNPG auto-generates and owns the basic-auth Secret at `<cluster-name>-app`.

The previous shape always set `bootstrap.initdb.secret.name = <cluster>-app`,
which forced CNPG to read a Secret that — in the no-ESO/no-BYO case — never
existed, blocking bootstrap. The unified e2e hit this exact case (kind
harness, no ESO ClusterSecretStore) and a stale `app.managedBy = ""` field
was masking the real failure mode.

Three modes now documented on the XRD:
  1. Omit `app` → CNPG auto-generates `<cluster-name>-app`.
  2. Set `app.externalSecret` → ESO writes the Secret CNPG reads.
  3. Set `app.secretName` (no externalSecret) → BYO; pre-create the Secret.

Backwards compatible: existing manifests with externalSecret or explicit
secretName render identically. CNPG will adopt a pre-existing
`<cluster-name>-app` if one is present.

Tests:
  - test-cluster: minimal asserts the secret line is omitted; external-secret
    asserts secret.name is wired; new BYO test covers explicit secretName.
  - e2etest-psql: drops the bogus `app.managedBy` field that broke KCL parse;
    relies on the new auto-gen path.

Implements [[tasks/merge-psql-client-apis-into-stack]]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apis/psqlclusters/definition.yaml`:
- Around line 159-170: The XRD declares secretRef.path contains a JSON blob with
username and password, but the ExternalSecret template
(functions/cluster/100-external-secret.yaml.gotmpl) currently reads only
.RemoteRef.key and renders a single password entry, causing mismatch with CNPG
template (functions/cluster/200-cnpg-cluster.yaml.gotmpl) which expects both
credentials; update 100-external-secret.yaml.gotmpl to consume the remote JSON
from secretRef.path/.RemoteRef (not just .RemoteRef.key) and map both username
and password into the rendered Secret (e.g., create two remote refs or a single
remoteRef with extractors for "username" and "password") so app.externalSecret
and superuser.externalSecret produce the full Secret shape that
200-cnpg-cluster.yaml.gotmpl requires.

In `@tests/e2etest-psql/main.k`:
- Around line 56-71: The retained snapshot dependency resources use fixed names
which can be reused across runs; make the Configuration metadata.name
("hops-ops-volume-snapshot-stack" in initResources) and the corresponding
VolumeSnapshotStack XR name unique per test run (e.g., append a
testRunId/timestamp/UUID) so each test installs and references a distinct
package instance, and update any other references (the VolumeSnapshotStack XR
created later around lines 125-139) to use the same generated unique name so
retention still works against the exact resource created for that run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 075163c9-3527-4d38-9ce4-9de815008e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 5d8a092 and b789ac7.

📒 Files selected for processing (5)
  • apis/psqlclusters/definition.yaml
  • functions/cluster/000-state-init.yaml.gotmpl
  • functions/cluster/200-cnpg-cluster.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • tests/test-cluster/main.k

Comment on lines +159 to +170
secretRef:
description: Where in the secrets backend the credentials live. The remote value must be a JSON blob with `username` and `password` properties — both are extracted into the resulting K8s Secret.
type: object
properties:
path:
description: Path/key in the secrets backend (e.g. AWS Secrets Manager `my-cluster/app`).
type: string
required:
- path
required:
- secretStore
- secretRef
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align the ExternalSecret schema with the rendered secret contract.

The XRD advertises secretRef.path holding a remote value with both username and password, but functions/cluster/100-external-secret.yaml.gotmpl currently consumes .RemoteRef.key and only renders a password entry. In the current shape, the advertised app.externalSecret / superuser.externalSecret mode cannot produce the Secret shape that functions/cluster/200-cnpg-cluster.yaml.gotmpl hands to CNPG.

Also applies to: 205-214

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apis/psqlclusters/definition.yaml` around lines 159 - 170, The XRD declares
secretRef.path contains a JSON blob with username and password, but the
ExternalSecret template (functions/cluster/100-external-secret.yaml.gotmpl)
currently reads only .RemoteRef.key and renders a single password entry, causing
mismatch with CNPG template (functions/cluster/200-cnpg-cluster.yaml.gotmpl)
which expects both credentials; update 100-external-secret.yaml.gotmpl to
consume the remote JSON from secretRef.path/.RemoteRef (not just .RemoteRef.key)
and map both username and password into the rendered Secret (e.g., create two
remote refs or a single remoteRef with extractors for "username" and "password")
so app.externalSecret and superuser.externalSecret produce the full Secret shape
that 200-cnpg-cluster.yaml.gotmpl requires.

Comment thread tests/e2etest-psql/main.k
PSQLStack now composes a `psql` StorageClass alongside its existing
`psql` VolumeSnapshotClass — they share the same CSI driver
(`ebs.csi.eks.amazonaws.com` by default) since snapshots only work
when the snapshotter driver matches the source PVC's provisioner.
PSQLCluster + PSQLBranch default `spec.storage.class` to "psql", so
consumer manifests stop leaking driver-specific knowledge.

Default StorageClass shape: gp3 + WaitForFirstConsumer (correct for
zonal CSI drivers — late-binds the PVC to a node so EBS volumes land
in the same AZ as the consuming pod) + allowVolumeExpansion=true
(CNPG resizes via the same field on its Cluster CR).

E2E pivots from kind to an ephemeral EKS Auto Mode cluster (mirror
of aws-observe-stack): provisions an AutoEKSCluster per run, installs
volume-snapshot-stack on it, then runs PSQLStack/Cluster/Branch
against the real `ebs.csi.eks.amazonaws.com` driver — same code path
that runs on pat-local. Kind has no snapshot-capable CSI driver
natively, so the prior kind-only e2e couldn't exercise PSQLBranch's
snapshot/fork chain.

Verified end-to-end on pat-local:
  - PSQLStack composed `psql` SC + `psql` VSC (both with
    ebs.csi.eks.amazonaws.com)
  - PSQLCluster PVC bound on `psql` SC, CNPG primary running
  - PSQLBranch VolumeSnapshot reached readyToUse=true, restored PVC
    bound on `psql` SC, CNPG fork primary running

Breaking: PSQLStack adds a new composed StorageClass by default. Sites
that already have a `psql` SC will conflict — set
`spec.storageClass.enabled: false` to opt out.

Requires three GitHub Actions vars on the repo (synced via the new
`hops vars sync github`): ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A,
PRIVATE_SUBNET_ID_B.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
.github/workflows/on-pr.yaml (1)

30-30: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin reusable workflow refs to immutable versions.

These four uses entries still target @feat/multi-api-support (mutable branch), which makes PR CI behavior drift over time. Please pin to an immutable tag or commit SHA for deterministic runs.

Also applies to: 43-43, 46-46, 93-93

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/on-pr.yaml at line 30, The workflow is referencing a
mutable branch ref ("uses:
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support")
which makes CI non-deterministic; replace each mutable branch ref (all
occurrences of the string
"unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support"
and the other similar `uses:` entries flagged) with an immutable ref (a specific
tag or commit SHA), updating each `uses:` line so it points to the chosen tag or
SHA (e.g., .../validate.yaml@v1.2.3 or ...@<commit-sha>) to ensure deterministic
workflow runs.
tests/e2etest-psql/main.k (1)

82-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the retained snapshot dependency resources unique per run.

These resources are intentionally retained, but both the package install and the VolumeSnapshotStack XR still use fixed names. A rerun against the same control plane can reuse stale snapshot-controller state instead of exercising a fresh dependency install for this run.

Suggested change
 _now = str(int(math.floor(datetime.ticks())))
 _test_name = "e2e-psql-" + _now
 _cluster_name = "e2e-psql-cluster-" + _now
 _branch_name = "e2e-psql-branch-" + _now
+_snapshot_package_name = "hops-ops-volume-snapshot-stack-" + _now
+_snapshot_stack_name = "snapshot-" + _now
 _namespace = "default"
@@
                 {
                     apiVersion = "pkg.crossplane.io/v1"
                     kind = "Configuration"
-                    metadata.name = "hops-ops-volume-snapshot-stack"
+                    metadata.name = _snapshot_package_name
                     spec.package = "ghcr.io/hops-ops/volume-snapshot-stack:v0.1.0"
                 }
@@
                 {
                     apiVersion = "hops.ops.com.ai/v1alpha1"
                     kind = "VolumeSnapshotStack"
                     metadata = {
-                        name = "snapshot"
+                        name = _snapshot_stack_name
                         namespace = _namespace
                     }

Also applies to: 250-255

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2etest-psql/main.k` around lines 82 - 90, The retained
VolumeSnapshotStack and its package install use fixed names ("metadata.name" =
"hops-ops-volume-snapshot-stack") so reruns can pick up stale state; make the
retained dependency resources unique per run by appending a run-specific suffix
(e.g., timestamp, CI_RUN_ID, or random ID) to the Configuration.metadata.name
and to the corresponding VolumeSnapshotStack XR name used elsewhere (the two
occurrences flagged around the current diff and at the other location), and
ensure any references/matchLabels that point to that name are updated to the
same generated suffix so the installer and the XR remain consistent per run.
🧹 Nitpick comments (1)
functions/stack/180-storageclass.yaml.gotmpl (1)

40-45: 💤 Low value

Parameter map keys are unquoted — may break for non-trivial key names.

{{ $k }} emits the key bare. If a StorageClass parameter key ever contains a colon, hash, or other YAML-special character, the rendered output will be invalid YAML. Quoting the key is zero-cost insurance.

♻️ Proposed fix
       {{- range $k, $v := $sc.parameters }}
-        {{ $k }}: {{ $v | quote }}
+        {{ $k | quote }}: {{ $v | quote }}
       {{- end }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@functions/stack/180-storageclass.yaml.gotmpl` around lines 40 - 45, The
StorageClass template emits parameter map keys unquoted (using {{ $k }}) which
can produce invalid YAML for keys with special characters; update the parameters
loop in the template that references $sc.parameters so each key is quoted (e.g.,
replace occurrences of {{ $k }} with a quoted form like {{ $k | quote }}) while
keeping values quoted as before, ensuring the keys are safely rendered for all
YAML-special characters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apis/psqlclusters/definition.yaml`:
- Around line 186-214: The superuser.externalSecret block is missing the
descriptive text present in app.externalSecret; update the
superuser.externalSecret schema to mirror app.externalSecret by documenting
secretStore.kind/name/namespace semantics (including default: ClusterSecretStore
and allowed enum values ClusterSecretStore/SecretStore and that namespace is
only required for SecretStore), and add a clear description for secretRef.path
stating it must point to a JSON blob containing "username" and "password" (same
contract as app.externalSecret) so kubectl explain/generated docs show
consistent expectations for superuser.externalSecret.

In `@apis/psqlstacks/definition.yaml`:
- Around line 90-92: The OpenAPI schema for ha.replicas currently allows zero;
update the replicas property in the definition.yaml (property name: replicas
under ha) to add a minimum: 1 constraint so users cannot set replicas: 0; ensure
the type remains integer and keep the default: 3, optionally updating the
property's description to note the minimum of 1 for clarity.

---

Duplicate comments:
In @.github/workflows/on-pr.yaml:
- Line 30: The workflow is referencing a mutable branch ref ("uses:
unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support")
which makes CI non-deterministic; replace each mutable branch ref (all
occurrences of the string
"unbounded-tech/workflows-crossplane/.github/workflows/validate.yaml@feat/multi-api-support"
and the other similar `uses:` entries flagged) with an immutable ref (a specific
tag or commit SHA), updating each `uses:` line so it points to the chosen tag or
SHA (e.g., .../validate.yaml@v1.2.3 or ...@<commit-sha>) to ensure deterministic
workflow runs.

In `@tests/e2etest-psql/main.k`:
- Around line 82-90: The retained VolumeSnapshotStack and its package install
use fixed names ("metadata.name" = "hops-ops-volume-snapshot-stack") so reruns
can pick up stale state; make the retained dependency resources unique per run
by appending a run-specific suffix (e.g., timestamp, CI_RUN_ID, or random ID) to
the Configuration.metadata.name and to the corresponding VolumeSnapshotStack XR
name used elsewhere (the two occurrences flagged around the current diff and at
the other location), and ensure any references/matchLabels that point to that
name are updated to the same generated suffix so the installer and the XR remain
consistent per run.

---

Nitpick comments:
In `@functions/stack/180-storageclass.yaml.gotmpl`:
- Around line 40-45: The StorageClass template emits parameter map keys unquoted
(using {{ $k }}) which can produce invalid YAML for keys with special
characters; update the parameters loop in the template that references
$sc.parameters so each key is quoted (e.g., replace occurrences of {{ $k }} with
a quoted form like {{ $k | quote }}) while keeping values quoted as before,
ensuring the keys are safely rendered for all YAML-special characters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 85ea3da6-87ef-41c5-b108-ccb63db2d812

📥 Commits

Reviewing files that changed from the base of the PR and between b789ac7 and 990ddfd.

📒 Files selected for processing (13)
  • .github/workflows/on-pr.yaml
  • .gitignore
  • README.md
  • apis/psqlbranches/definition.yaml
  • apis/psqlclusters/definition.yaml
  • apis/psqlstacks/definition.yaml
  • functions/branch/000-state-init.yaml.gotmpl
  • functions/cluster/000-state-init.yaml.gotmpl
  • functions/stack/000-state-init.yaml.gotmpl
  • functions/stack/010-state-status.yaml.gotmpl
  • functions/stack/180-storageclass.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • tests/test-stack/main.k
✅ Files skipped from review due to trivial changes (3)
  • .gitignore
  • functions/branch/000-state-init.yaml.gotmpl
  • README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • functions/stack/010-state-status.yaml.gotmpl
  • apis/psqlbranches/definition.yaml
  • tests/test-stack/main.k
  • functions/stack/000-state-init.yaml.gotmpl

Comment thread apis/psqlclusters/definition.yaml
Comment on lines +90 to +92
replicas:
type: integer
default: 3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

ha.replicas lacks a minimum constraint — zero replicas is accepted.

Without minimum: 1, a user can set replicas: 0, which would scale every HA-able operator Deployment to zero and silently break the stack.

🛡️ Proposed fix
                  replicas:
                    type: integer
+                   minimum: 1
                    default: 3
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apis/psqlstacks/definition.yaml` around lines 90 - 92, The OpenAPI schema for
ha.replicas currently allows zero; update the replicas property in the
definition.yaml (property name: replicas under ha) to add a minimum: 1
constraint so users cannot set replicas: 0; ensure the type remains integer and
keep the default: 3, optionally updating the property's description to note the
minimum of 1 for clarity.

The full chain converged on the previous run except for PSQLStack's
scaleToZeroPlugin objects (s2z-cert-client, s2z-cert-server, s2z-issuer)
— the S2Z plugin uses cert-manager's Issuer + Certificate for its gRPC
mTLS pair, and the ephemeral EKS cluster has no cert-manager. pat-local
already has cert-manager from earlier setup, which masked this in local
validation.

cert-stack v0.1.0 mirrors volume-snapshot-stack's shape (single
clusterName field, composes a Helm release on the target). Adding it
as initResource + extraResource closes the gap.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Published Crossplane Package

The following Crossplane package was published as part of this PR:

Package: ghcr.io/hops-ops/psql-stack:pr-10-55b4a02981f9e74b194fb9e818bc6df76089e407

View Package

@patrickleet
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
apis/psqlstacks/definition.yaml (1)

90-92: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

ha.replicas still lacks a minimum: 1 constraint.

A user can set replicas: 0, silently scaling all HA-able platform Deployments to zero.

🛡️ Proposed fix
                  replicas:
                    type: integer
+                   minimum: 1
                    default: 3
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apis/psqlstacks/definition.yaml` around lines 90 - 92, The schema for the
replicas integer (ha.replicas) allows zero; update the replicas property in
apis/psqlstacks/definition.yaml to add a minimum: 1 constraint so users cannot
set replicas: 0; locate the replicas definition (type: integer, default: 3) and
add the minimum validator for that property to enforce at least one replica.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 195-204: The parameters object currently combines
additionalProperties: type: string with x-kubernetes-preserve-unknown-fields:
true which conflicts; decide which behavior you want and make both parameter
blocks consistent: if parameter map values must be plain strings (e.g. passed
verbatim to the CSI driver) remove x-kubernetes-preserve-unknown-fields and keep
additionalProperties: type: string; if parameters may contain nested/arbitrary
structures remove additionalProperties and keep
x-kubernetes-preserve-unknown-fields; apply the same change to
snapshotClass.parameters as well so both parameter definitions use the same,
non-conflicting approach.

---

Duplicate comments:
In `@apis/psqlstacks/definition.yaml`:
- Around line 90-92: The schema for the replicas integer (ha.replicas) allows
zero; update the replicas property in apis/psqlstacks/definition.yaml to add a
minimum: 1 constraint so users cannot set replicas: 0; locate the replicas
definition (type: integer, default: 3) and add the minimum validator for that
property to enforce at least one replica.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c458ce55-33a3-418a-b8e5-6f7008816953

📥 Commits

Reviewing files that changed from the base of the PR and between b789ac7 and 877c9c1.

📒 Files selected for processing (13)
  • .github/workflows/on-pr.yaml
  • .gitignore
  • README.md
  • apis/psqlbranches/definition.yaml
  • apis/psqlclusters/definition.yaml
  • apis/psqlstacks/definition.yaml
  • functions/branch/000-state-init.yaml.gotmpl
  • functions/cluster/000-state-init.yaml.gotmpl
  • functions/stack/000-state-init.yaml.gotmpl
  • functions/stack/010-state-status.yaml.gotmpl
  • functions/stack/180-storageclass.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • tests/test-stack/main.k
✅ Files skipped from review due to trivial changes (4)
  • functions/stack/010-state-status.yaml.gotmpl
  • .gitignore
  • functions/stack/180-storageclass.yaml.gotmpl
  • README.md
🚧 Files skipped from review as they are similar to previous changes (8)
  • functions/stack/000-state-init.yaml.gotmpl
  • functions/branch/000-state-init.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • functions/cluster/000-state-init.yaml.gotmpl
  • apis/psqlbranches/definition.yaml
  • apis/psqlclusters/definition.yaml
  • tests/test-stack/main.k
  • .github/workflows/on-pr.yaml

Comment on lines +195 to +204
parameters:
description: |
Provisioner-specific parameters. Defaults to `{type: gp3}` for
the EBS provisioner. When overriding `provisioner`, set this
to whatever the new provisioner expects (e.g. `{}` for
hostpath.csi.k8s.io, driver-specific keys for Longhorn).
type: object
additionalProperties:
type: string
x-kubernetes-preserve-unknown-fields: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

parameters combines additionalProperties with x-kubernetes-preserve-unknown-fields: true — the constraints are contradictory.

additionalProperties: type: string tells Kubernetes to validate and prune map values that aren't strings, while x-kubernetes-preserve-unknown-fields: true instructs the API server to skip structural pruning entirely. The two directives conflict: in practice the pruning bypass wins, silently accepting non-string values that the schema claims to reject.

The same pattern repeats for snapshotClass.parameters (Lines 237–239).

Pick one approach:

  • If values must be strings (e.g. they are passed verbatim to the CSI driver), keep additionalProperties and drop x-kubernetes-preserve-unknown-fields.
  • If nested/arbitrary structures are needed, drop additionalProperties and keep x-kubernetes-preserve-unknown-fields.
♻️ Proposed fix (string-values-only)
                  parameters:
                    type: object
                    additionalProperties:
                      type: string
-                   x-kubernetes-preserve-unknown-fields: true

Apply the same change to snapshotClass.parameters at Lines 237–239.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parameters:
description: |
Provisioner-specific parameters. Defaults to `{type: gp3}` for
the EBS provisioner. When overriding `provisioner`, set this
to whatever the new provisioner expects (e.g. `{}` for
hostpath.csi.k8s.io, driver-specific keys for Longhorn).
type: object
additionalProperties:
type: string
x-kubernetes-preserve-unknown-fields: true
parameters:
description: |
Provisioner-specific parameters. Defaults to `{type: gp3}` for
the EBS provisioner. When overriding `provisioner`, set this
to whatever the new provisioner expects (e.g. `{}` for
hostpath.csi.k8s.io, driver-specific keys for Longhorn).
type: object
additionalProperties:
type: string
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apis/psqlstacks/definition.yaml` around lines 195 - 204, The parameters
object currently combines additionalProperties: type: string with
x-kubernetes-preserve-unknown-fields: true which conflicts; decide which
behavior you want and make both parameter blocks consistent: if parameter map
values must be plain strings (e.g. passed verbatim to the CSI driver) remove
x-kubernetes-preserve-unknown-fields and keep additionalProperties: type:
string; if parameters may contain nested/arbitrary structures remove
additionalProperties and keep x-kubernetes-preserve-unknown-fields; apply the
same change to snapshotClass.parameters as well so both parameter definitions
use the same, non-conflicting approach.

patrickleet added 12 commits May 8, 2026 13:16
Replaces the four mutable `@feat/multi-api-support` branch refs with
the immutable `@v3.0.0` tag now that the multi-api work is shipped.
Also retargets the org from `unbounded-tech` to `hops-ops` — the
canonical home for hops platform CI workflows.
…licit set

The previous schema and gotmpl both defaulted branch postgres version to
"17", silently coercing branches off PG 15/16 sources to a major mismatch.
Volume snapshot recovery is binary-compatible only within a major version
(Postgres won't open a data dir from a different major), so a hardcoded
default produced silent failures at restore time.

Now: omit imageName entirely when version is unset, letting CNPG fall back
to its operator-default image (close-enough when source tracks the same
chart). Setting spec.postgresql.version is now an explicit pin, typically
used to fix a minor (e.g. "17.4") to match the source's reported version.

Adds a render test for the explicit-pin path; existing default-path test
asserts imageName absence.
Existing state-init logic correctly uses hasKey to distinguish "explicitly
false" from "absent" before reading $monSpec.enabled, but no test exercised
the explicit-false path. Added regression test asserting that
spec.monitoring.enabled=false propagates to the composed Cluster CR's
monitoring.enablePodMonitor=false. Default-on path covered by Test 1.
…tent

In cross-namespace branching, the branch-ns VolumeSnapshot can only bind
once it learns the source's VolumeSnapshotContent name — that name is
propagated through $state.observed.snapshotContent into 110-branch-snapshot's
render. The previous logic preferred branch-snapshot whenever it was present
in observed, even when its boundVolumeSnapshotContentName was still empty —
which it always is on first reconcile, before the source content has
propagated. Result: empty content overwrites the populated source content,
the branch-ns snapshot renders with `volumeSnapshotContentName: ""`, and the
chain stalls.

Now: read both branch-ns and source-ns content via the existing pipeline,
prefer branch when non-empty, fall back to source otherwise. Fixes the
chicken-and-egg that prevented cross-namespace branches from binding on
first-pass reconcile.

Same-namespace branching is unaffected: source-snapshot is gated on
crossNamespace and isn't composed there, so $sourceContent is empty and
the branch-ns content (always populated post-bind) wins.
Adds two CompositionTests that exercise 010-state-status's snapshot-
content fallback against inline observedResources:

1. branch-snapshot bound content empty + source-snapshot bound: branch-ns
   VolumeSnapshot must render with the source's content name (proves the
   chicken-and-egg fix from 18d3115).
2. branch-snapshot bound + source-snapshot bound: steady-state — branch's
   content wins (proves the fallback doesn't overwrite a populated branch
   content with source's).

Without the fix, test 1 would render volumeSnapshotContentName: "" and
fail. With it, source's content propagates through state.observed.snapshotContent
into 110-branch-snapshot's render. Locks in the cross-namespace bind
behavior — the previous test surface only exercised render shape, not
state propagation across reconciles.
…espace

The source-ns VolumeSnapshot lives in a namespace shared by multiple
branches (the source PSQLCluster's). Naming it just `<branchName>-src`
collides when two PSQLBranch XRs have the same metadata.name in
different branch namespaces — both would create
`preview-pr-1-src` in `team-app`, racing on the same object.

Now: `<branchNS>-<branchName>-src`. Encodes the branch XR's own
namespace into the name so (sourceNS, branchNS, branchName) is unique.
K8s names are bound by RFC 1123 subdomain (253 chars) — concatenation
is safe at any reasonable namespace/branch length.

Branch-ns snapshot (`<branchName>-snap`) is unchanged: the branch
namespace is already implicit by where the resource lives, and branch
XR names are unique within a single namespace.

Adds a regression test that asserts the source-ns name varies with
branch namespace.
Previously the gotmpl coerced unset `branch.storage.size` to a hardcoded
"10Gi" — silently mis-sizing branches off any source PVC larger than that.
CNPG/EBS can't shrink during recovery, so a 10Gi branch off a 100Gi source
fails when CNPG tries to bind the restored PVC.

New shape:
  - PSQLBranch XRD adds optional `spec.source.storage.size` so consumers
    can mirror the source PSQLCluster's known capacity. The branch
    composition has no automatic visibility into the source's PVC, so
    the user declares it once on the branch spec.
  - Size precedence in the cnpg-cluster render:
      branch.storage.size  (explicit override, e.g. growing the branch)
      → source.storage.size  (inherit from the source)
      → omitted             (CNPG's webhook rejects with a clear error)
  - Drops the silent 10Gi gotmpl fallback in 000-state-init.

Examples + e2e branch updated to declare source.storage.size mirroring
the source PSQLCluster's spec.storage.size. Local pat-local manifests
already set branch.storage.size explicitly so unaffected.

Symlinks the per-test KCL `model/` directory to `.up/kcl/models` so
schema changes from `up project build` propagate to tests automatically.
Previously the bundled models drifted from the XRD on every change and
needed manual regeneration. `.up/` is gitignored — fresh clones run
`make build` (or `up project build`) to populate the symlink target.

Adds two regression tests: branch.size overrides source.size, and
source.size used when branch.size is empty.
…ce names

The state-status template looked up `external-secret` in the observed
map, but 100-external-secret renders two distinct resources with
suffixed names (`external-secret-app`, `external-secret-superuser`)
when their respective config blocks are set. The lookup never matched,
so $state.observed.externalSecret.ready was always false.

Now: aggregate over both names. Ready=true when every present ES
Object reports Ready=true, and Ready=true when neither is present
(BYO / CNPG-managed-secret paths — nothing to wait on).

Note: the field is currently unused (999-status doesn't surface it,
no composition gates on it). Fixing now to match the documented
intent so future consumers don't inherit the bug.
The validate:% recipe was using $$definition, $$composition, $$api_dir
shell variables that were never initialized — `make validate:minimal`
silently fed empty strings to `up composition render` and
`crossplane beta validate`, so the recipe never produced a real
validation result.

Aligns validate:% with render:%: both now use the top-level Makefile
variables ($(DEFINITION), $(COMPOSITION), $(XRD_DIR)) which point at
apis/psqlstacks. Single-target shorthand stays psqlstacks-only as
documented in README. The :all targets keep their derive-per-example
shell logic for the multi-API case.
Existing test 7 only asserted the ExternalSecret existed with the right
secretStoreRef and target.name — it didn't lock in the data[] mappings
or the basic-auth target template. The current gotmpl correctly maps
both username and password as separate remoteRef entries (with
`property: username` / `property: password`) extracted from the same
JSON blob at secretRef.path, then synthesizes a kubernetes.io/basic-auth
Secret. Asserting the full shape so a regression that drops one key
fails the test instead of silently shipping a half-populated Secret.
ESO 0.10+ moved to v1 GA, and recent ESO charts (the version
cert-stack/aws-secret-stack already use) drop the v1beta1
served-version: applies fail with `no matches for kind
"ExternalSecret" in version "external-secrets.io/v1beta1"`.

Schema shape (data[].remoteRef.{key,property} + target.template) is
identical between v1beta1 and v1, so this is a pure apiVersion bump.
Aligns with the other stacks that already use v1 (auth, gitops,
cloudflare/dns, aws/secret).
The superuser.externalSecret schema had the same shape as
app.externalSecret but no field descriptions, so kubectl explain and
generated docs gave inconsistent guidance for what's the same contract.
Adds the missing descriptions (secretStore.{kind,name,namespace} and
secretRef.{path}) verbatim from app.externalSecret. Pure documentation
change — no behavior, default, or required-fields shift.
@patrickleet
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
Makefile (2)

13-14: 💤 Low value

api-dir macro is unused — bulk targets reimplement it inline.

The api-dir helper at line 14 is never invoked; both render:all and validate:all inline the same awk -F/ '{print "apis/" $2}' derivation. Either invoke $(call api-dir,$$example) from the recipes (note: shell-vs-Make expansion needs care here, since $$example is a shell var) or drop the macro and the comment block at lines 12-14 to avoid confusion.

Also applies to: 40-40, 73-73

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 13 - 14, The Makefile defines an unused helper macro
api-dir which duplicates logic also written inline in the render:all and
validate:all recipes; either replace those inline awk derivations with a proper
call to the macro using $(call api-dir,$$example) (taking care to pass the shell
variable example as $$example so Make expands correctly) in the render:all and
validate:all recipe bodies, or remove the api-dir macro and its comment block
entirely to avoid confusion; update all other occurrences flagged (lines near 40
and 73) to use the same approach so the helper is consistently used or
consistently removed.

109-117: ⚡ Quick win

Single-example aliases silently route every stem to apis/psqlstacks.

render:% and validate:% still hardcode examples/psqlstacks/$*.yaml and the legacy $(DEFINITION)/$(COMPOSITION)/$(XRD_DIR) globals. Now that examples exist under psqlclusters/ and psqlbranches/, invoking e.g. make render:minimal or make validate:standard will render/validate the psqlstacks example against the psqlstacks XRD even when the user intended a multi-API one — and there's no error to indicate the mismatch.

Two reasonable options:

  1. Make the stem encode the api plural (e.g. make render:psqlclusters/minimal) and derive api_dir from it the same way the bulk targets do.
  2. Rename these targets to render:psqlstacks:% / validate:psqlstacks:% so the psqlstacks-only scope is explicit.
Sketch for option 1
 render\:%:
-	`@example`="examples/psqlstacks/$*.yaml"; \
-	up composition render --xrd=$(DEFINITION) $(COMPOSITION) $$example
+	`@example`="examples/$*.yaml"; \
+	api_dir=$$(echo "$$example" | awk -F/ '{print "apis/" $$2}'); \
+	up composition render --xrd=$$api_dir/definition.yaml $$api_dir/composition.yaml $$example

 validate\:%:
-	`@example`="examples/psqlstacks/$*.yaml"; \
-	up composition render --xrd=$(DEFINITION) $(COMPOSITION) $$example \
-		--include-full-xr --quiet | \
-		crossplane beta validate $(XRD_DIR) --error-on-missing-schemas -
+	`@example`="examples/$*.yaml"; \
+	api_dir=$$(echo "$$example" | awk -F/ '{print "apis/" $$2}'); \
+	up composition render --xrd=$$api_dir/definition.yaml $$api_dir/composition.yaml $$example \
+		--include-full-xr --quiet | \
+		crossplane beta validate $$api_dir --error-on-missing-schemas -
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 109 - 117, The render:% and validate:% phony targets
currently hardcode examples/psqlstacks/$*.yaml and global
$(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), causing every stem to map to
psqlstacks; change them to derive the API plural from the stem (like the bulk
targets do) and build example and XRD/composition paths from that API: compute
api_dir from the stem (e.g. split $* on "/" to get the first field), set
example=examples/$(api_dir)/$*.yaml and set the corresponding
DEFINITION/COMPOSITION/XRD_DIR for that api, then call up composition render and
crossplane beta validate with those derived variables; alternatively, if you
prefer explicit scoping, rename the targets to render:psqlstacks:% and
validate:psqlstacks:% to make the psqlstacks-only behavior explicit.
.github/workflows/on-pr.yaml (1)

53-64: 💤 Low value

Move aws-account-id to a GitHub variable for consistency with sibling inputs.

ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, and PRIVATE_SUBNET_ID_B are sourced from vars.*, but aws-account-id is hardcoded. AWS account IDs aren't strictly secret, but treating them as configuration (a) keeps this workflow portable across environments, (b) avoids a repo-wide search/replace if the account ever changes, and (c) matches the convention already established in this same job.

Diff
       aws: true
       aws-use-oidc: true
-      aws-account-id: "034489662075"
+      aws-account-id: ${{ vars.AWS_ACCOUNT_ID }}
       aws-region: us-east-2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/on-pr.yaml around lines 53 - 64, The workflow hardcodes
aws-account-id while related values use vars.*, so change the aws-account-id
entry to read from a repo variable (e.g., replace aws-account-id: "034489662075"
with aws-account-id: ${{ vars.AWS_ACCOUNT_ID }}), keeping the rest of the job
(env-vars with ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B)
unchanged so the job consistently sources configuration from GitHub variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test-branch/main.k`:
- Around line 429-456: The test metav1alpha1.CompositionTest named
"scaletozero-disabled-strips-plugin" currently only asserts name/namespace;
update its assertResources for the entry with metadata.name
"br-no-s2z-cnpg-cluster" to explicitly assert that
spec.forProvider.manifest.metadata.annotations does NOT contain the
"cnpg-i-scale-to-zero.xata.io/idle-timeout" key (or assert annotations == {} /
absent) and that spec.forProvider.manifest.spec.plugins does NOT include the
scale-to-zero plugin entry; target the same xr stacksv1alpha1.PSQLBranch and the
assertResources object to add these negative assertions so the test fails if the
annotation or plugin are still emitted.

---

Nitpick comments:
In @.github/workflows/on-pr.yaml:
- Around line 53-64: The workflow hardcodes aws-account-id while related values
use vars.*, so change the aws-account-id entry to read from a repo variable
(e.g., replace aws-account-id: "034489662075" with aws-account-id: ${{
vars.AWS_ACCOUNT_ID }}), keeping the rest of the job (env-vars with
ADMIN_ROLE_ARN, PRIVATE_SUBNET_ID_A, PRIVATE_SUBNET_ID_B) unchanged so the job
consistently sources configuration from GitHub variables.

In `@Makefile`:
- Around line 13-14: The Makefile defines an unused helper macro api-dir which
duplicates logic also written inline in the render:all and validate:all recipes;
either replace those inline awk derivations with a proper call to the macro
using $(call api-dir,$$example) (taking care to pass the shell variable example
as $$example so Make expands correctly) in the render:all and validate:all
recipe bodies, or remove the api-dir macro and its comment block entirely to
avoid confusion; update all other occurrences flagged (lines near 40 and 73) to
use the same approach so the helper is consistently used or consistently
removed.
- Around line 109-117: The render:% and validate:% phony targets currently
hardcode examples/psqlstacks/$*.yaml and global
$(DEFINITION)/$(COMPOSITION)/$(XRD_DIR), causing every stem to map to
psqlstacks; change them to derive the API plural from the stem (like the bulk
targets do) and build example and XRD/composition paths from that API: compute
api_dir from the stem (e.g. split $* on "/" to get the first field), set
example=examples/$(api_dir)/$*.yaml and set the corresponding
DEFINITION/COMPOSITION/XRD_DIR for that api, then call up composition render and
crossplane beta validate with those derived variables; alternatively, if you
prefer explicit scoping, rename the targets to render:psqlstacks:% and
validate:psqlstacks:% to make the psqlstacks-only behavior explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79f332a7-c175-4e49-b64c-1c1e2aae7d2f

📥 Commits

Reviewing files that changed from the base of the PR and between 877c9c1 and 6b8e83c.

📒 Files selected for processing (16)
  • .github/workflows/on-pr.yaml
  • Makefile
  • apis/psqlbranches/definition.yaml
  • apis/psqlclusters/definition.yaml
  • examples/psqlbranches/same-namespace.yaml
  • functions/branch/000-state-init.yaml.gotmpl
  • functions/branch/010-state-status.yaml.gotmpl
  • functions/branch/100-source-snapshot.yaml.gotmpl
  • functions/branch/200-cnpg-cluster.yaml.gotmpl
  • functions/cluster/010-state-status.yaml.gotmpl
  • functions/cluster/100-external-secret.yaml.gotmpl
  • tests/e2etest-psql/main.k
  • tests/test-branch/main.k
  • tests/test-branch/model
  • tests/test-cluster/main.k
  • tests/test-cluster/model
✅ Files skipped from review due to trivial changes (2)
  • tests/test-cluster/model
  • examples/psqlbranches/same-namespace.yaml
🚧 Files skipped from review as they are similar to previous changes (8)
  • functions/cluster/010-state-status.yaml.gotmpl
  • functions/cluster/100-external-secret.yaml.gotmpl
  • functions/branch/010-state-status.yaml.gotmpl
  • functions/branch/100-source-snapshot.yaml.gotmpl
  • tests/test-cluster/main.k
  • tests/e2etest-psql/main.k
  • functions/branch/200-cnpg-cluster.yaml.gotmpl
  • apis/psqlclusters/definition.yaml

Comment thread tests/test-branch/main.k
Comment on lines +429 to +456
metav1alpha1.CompositionTest {
metadata.name = "scaletozero-disabled-strips-plugin"
spec = {
compositionPath = "apis/psqlbranches/composition.yaml"
xrdPath = "apis/psqlbranches/definition.yaml"
timeoutSeconds = 60
validate = False
xr = stacksv1alpha1.PSQLBranch {
metadata = {name = "br-no-s2z", namespace = "default"}
spec = {
clusterName = "my-cluster"
source = {name = "src"}
scaleToZero = {enabled = False}
}
}
assertResources = [
{
apiVersion = "kubernetes.m.crossplane.io/v1alpha1"
kind = "Object"
metadata.name = "br-no-s2z-cnpg-cluster"
spec.forProvider.manifest.metadata = {
name = "br-no-s2z"
namespace = "default"
}
}
]
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether the KCL CompositionTest harness in this repo supports
# negative/absence assertions, and how existing tests express "key absent".
fd -t f 'main.k' tests | head -20
rg -nP --type-add 'kcl:*.k' --type=kcl -C3 '\b(assertResources|assertNotPresent|absent|notExpected)\b'

Repository: hops-ops/psql-stack

Length of output: 17809


🏁 Script executed:

#!/bin/bash
# Examine the scaletozero-disabled and ttl-enabled tests in test-branch/main.k
sed -n '429,488p' tests/test-branch/main.k

Repository: hops-ops/psql-stack

Length of output: 2453


🏁 Script executed:

#!/bin/bash
# Look for comments or patterns showing how assertions work, especially for annotations
rg -n 'metadata\.annotations|spec\.plugins' tests/ --type-add 'kcl:*.k' --type=kcl -B2 -A2

Repository: hops-ops/psql-stack

Length of output: 1810


🏁 Script executed:

#!/bin/bash
# Search for any test documentation or comments about assertion semantics
rg -n 'assertResources|assert' tests/test-branch/main.k -B1 -A5 | head -100

Repository: hops-ops/psql-stack

Length of output: 4268


scaletozero-disabled-strips-plugin doesn't assert the absence of the scale-to-zero annotation and plugin.

The test name promises that disabling scaleToZero removes both the cnpg-i-scale-to-zero.xata.io/idle-timeout annotation and the plugin entry. However, assertResources only checks metadata.name and namespace. Compare this to the scaletozero-enabled test (lines 407–422), which explicitly asserts spec.forProvider.manifest.metadata.annotations and spec.forProvider.manifest.spec.plugins—or the ttl-enabled-adds-annotation test (lines 476–485), which asserts the full annotation set. If the composition regressed and kept emitting the annotation and plugin when scaleToZero.enabled = False, this test would pass silently.

Add explicit assertions for the absent fields:

Suggested tightening
             assertResources = [
                 {
                     apiVersion = "kubernetes.m.crossplane.io/v1alpha1"
                     kind = "Object"
                     metadata.name = "br-no-s2z-cnpg-cluster"
                     spec.forProvider.manifest.metadata = {
                         name = "br-no-s2z"
                         namespace = "default"
                     }
+                    spec.forProvider.manifest.metadata.annotations = {}
+                    spec.forProvider.manifest.spec.plugins = []
                 }
             ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metav1alpha1.CompositionTest {
metadata.name = "scaletozero-disabled-strips-plugin"
spec = {
compositionPath = "apis/psqlbranches/composition.yaml"
xrdPath = "apis/psqlbranches/definition.yaml"
timeoutSeconds = 60
validate = False
xr = stacksv1alpha1.PSQLBranch {
metadata = {name = "br-no-s2z", namespace = "default"}
spec = {
clusterName = "my-cluster"
source = {name = "src"}
scaleToZero = {enabled = False}
}
}
assertResources = [
{
apiVersion = "kubernetes.m.crossplane.io/v1alpha1"
kind = "Object"
metadata.name = "br-no-s2z-cnpg-cluster"
spec.forProvider.manifest.metadata = {
name = "br-no-s2z"
namespace = "default"
}
}
]
}
}
metav1alpha1.CompositionTest {
metadata.name = "scaletozero-disabled-strips-plugin"
spec = {
compositionPath = "apis/psqlbranches/composition.yaml"
xrdPath = "apis/psqlbranches/definition.yaml"
timeoutSeconds = 60
validate = False
xr = stacksv1alpha1.PSQLBranch {
metadata = {name = "br-no-s2z", namespace = "default"}
spec = {
clusterName = "my-cluster"
source = {name = "src"}
scaleToZero = {enabled = False}
}
}
assertResources = [
{
apiVersion = "kubernetes.m.crossplane.io/v1alpha1"
kind = "Object"
metadata.name = "br-no-s2z-cnpg-cluster"
spec.forProvider.manifest.metadata = {
name = "br-no-s2z"
namespace = "default"
}
spec.forProvider.manifest.metadata.annotations = {}
spec.forProvider.manifest.spec.plugins = []
}
]
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test-branch/main.k` around lines 429 - 456, The test
metav1alpha1.CompositionTest named "scaletozero-disabled-strips-plugin"
currently only asserts name/namespace; update its assertResources for the entry
with metadata.name "br-no-s2z-cnpg-cluster" to explicitly assert that
spec.forProvider.manifest.metadata.annotations does NOT contain the
"cnpg-i-scale-to-zero.xata.io/idle-timeout" key (or assert annotations == {} /
absent) and that spec.forProvider.manifest.spec.plugins does NOT include the
scale-to-zero plugin entry; target the same xr stacksv1alpha1.PSQLBranch and the
assertResources object to add these negative assertions so the test fails if the
annotation or plugin are still emitted.

@patrickleet patrickleet merged commit ced7961 into main May 8, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant