nydus-snapshotter: Helm deployment fails with FailedPostStartHook - containerd restart kills pod

## Issue: Helm chart fails to deploy - postStart hook restarts containerd causing CrashLoopBackOff

### Environment
- Kubernetes: v1.30+
- Containerd: v1.7.25
- nydus-snapshotter Helm chart: latest from dragonfly repo
- Chart version used: nydus-snapshotter (dragonfly/nydus-snapshotter)

### Problem Description

The nydus-snapshotter DaemonSet fails to start with `FailedPostStartHook` errors, followed by CrashLoopBackOff. The root cause is the postStart lifecycle hook that restarts containerd, which creates a circular failure.

### Root Cause

The postStart hook executes:
```bash
nsenter -t 1 -m systemctl -- restart containerd.service
```

When this command runs:
1. Container starts
2. postStart hook executes and restarts containerd
3. Containerd restart kills all running containers, including the pod that triggered the restart
4. Pod never completes startup successfully
5. Kubernetes marks it as failed and restarts the pod
6. Loop continues indefinitely

### Steps to Reproduce

1. Deploy nydus-snapshotter using Helm:
```bash
helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
helm repo update
helm install nydus-snapshotter dragonfly/nydus-snapshotter \
  --namespace nydus-snapshotter \
  --create-namespace \
  --wait
```

2. Observe pod status:
```bash
kubectl get pods -n nydus-snapshotter
```

### Expected Behavior

Pods should start successfully and run the nydus-snapshotter service.

### Actual Behavior

```
NAME                      READY   STATUS               RESTARTS      AGE
nydus-snapshotter-xxxxx   0/1     CrashLoopBackOff     2 (17s ago)   87s
```

Events show:
```
Warning  FailedPostStartHook  PostStartHook failed
Normal   Killing              FailedPostStartHook
```

### Analysis

The postStart hook is attempting to reload containerd configuration after the init container modifies `/etc/containerd/config.toml`. However, restarting containerd terminates the very pod that's performing the restart, preventing successful startup.

Containerd does not support `systemctl reload` (CanReload=no), so using `reload` instead of `restart` is not viable.

### Proposed Solutions

1. **Remove postStart hook entirely** - Document that users must manually restart containerd after deployment
2. **Use a separate Job** - Create a pre-install Job that configures containerd and restarts it before the DaemonSet starts
3. **Background restart with delay** - Fork the restart into background with delay (hacky but works):
   ```bash
   (sleep 3 && nsenter -t 1 -m -- systemctl restart containerd.service) &
   ```
4. **Add configuration option** - Allow users to disable the hook via values.yaml:
   ```yaml
   containerRuntime:
     containerd:
       enable: true
       autoRestart: false  # new option
   ```

### Related Issues

- Similar issue was partially addressed in https://github.com/dragonflyoss/helm-charts/pull/209 (Dec 2023)
- Kubernetes postStart hook behavior is documented to kill containers on failure

### Workaround

Temporarily disable containerd configuration injection and configure manually:
```yaml
containerRuntime:
  containerd:
    enable: false
```

Then manually add to `/etc/containerd/config.toml` on each node:
```toml
[proxy_plugins]
  [proxy_plugins.nydus]
    type = "snapshot"
    address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
```

And restart containerd once manually.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nydus-snapshotter: Helm deployment fails with FailedPostStartHook - containerd restart kills pod #466

Issue: Helm chart fails to deploy - postStart hook restarts containerd causing CrashLoopBackOff

Environment

Problem Description

Root Cause

Steps to Reproduce

Expected Behavior

Actual Behavior

Analysis

Proposed Solutions

Related Issues

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nydus-snapshotter: Helm deployment fails with FailedPostStartHook - containerd restart kills pod #466

Description

Issue: Helm chart fails to deploy - postStart hook restarts containerd causing CrashLoopBackOff

Environment

Problem Description

Root Cause

Steps to Reproduce

Expected Behavior

Actual Behavior

Analysis

Proposed Solutions

Related Issues

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions