-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Issue: Helm chart fails to deploy - postStart hook restarts containerd causing CrashLoopBackOff
Environment
- Kubernetes: v1.30+
- Containerd: v1.7.25
- nydus-snapshotter Helm chart: latest from dragonfly repo
- Chart version used: nydus-snapshotter (dragonfly/nydus-snapshotter)
Problem Description
The nydus-snapshotter DaemonSet fails to start with FailedPostStartHook errors, followed by CrashLoopBackOff. The root cause is the postStart lifecycle hook that restarts containerd, which creates a circular failure.
Root Cause
The postStart hook executes:
nsenter -t 1 -m systemctl -- restart containerd.serviceWhen this command runs:
- Container starts
- postStart hook executes and restarts containerd
- Containerd restart kills all running containers, including the pod that triggered the restart
- Pod never completes startup successfully
- Kubernetes marks it as failed and restarts the pod
- Loop continues indefinitely
Steps to Reproduce
- Deploy nydus-snapshotter using Helm:
helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
helm repo update
helm install nydus-snapshotter dragonfly/nydus-snapshotter \
--namespace nydus-snapshotter \
--create-namespace \
--wait- Observe pod status:
kubectl get pods -n nydus-snapshotterExpected Behavior
Pods should start successfully and run the nydus-snapshotter service.
Actual Behavior
NAME READY STATUS RESTARTS AGE
nydus-snapshotter-xxxxx 0/1 CrashLoopBackOff 2 (17s ago) 87s
Events show:
Warning FailedPostStartHook PostStartHook failed
Normal Killing FailedPostStartHook
Analysis
The postStart hook is attempting to reload containerd configuration after the init container modifies /etc/containerd/config.toml. However, restarting containerd terminates the very pod that's performing the restart, preventing successful startup.
Containerd does not support systemctl reload (CanReload=no), so using reload instead of restart is not viable.
Proposed Solutions
- Remove postStart hook entirely - Document that users must manually restart containerd after deployment
- Use a separate Job - Create a pre-install Job that configures containerd and restarts it before the DaemonSet starts
- Background restart with delay - Fork the restart into background with delay (hacky but works):
(sleep 3 && nsenter -t 1 -m -- systemctl restart containerd.service) &
- Add configuration option - Allow users to disable the hook via values.yaml:
containerRuntime: containerd: enable: true autoRestart: false # new option
Related Issues
- Similar issue was partially addressed in fix: nydus-snapshotter can not run cause of failing of postStart hook #209 (Dec 2023)
- Kubernetes postStart hook behavior is documented to kill containers on failure
Workaround
Temporarily disable containerd configuration injection and configure manually:
containerRuntime:
containerd:
enable: falseThen manually add to /etc/containerd/config.toml on each node:
[proxy_plugins]
[proxy_plugins.nydus]
type = "snapshot"
address = "/run/containerd-nydus/containerd-nydus-grpc.sock"And restart containerd once manually.