on
Helm contribution note: fixing post install hook deletion failure due to before-hook-creation policy
After over one year and 4 months, I was happy to see the PR #11387 that I have submitted to the helm project finally accepted and merged. I would like to take the opportunity to explain what problem did my contribution solve. It all started after this error: Error: failed post-install: warning: Hook post-install testing-hooks-chart/templates/pod.yaml failed: object is being deleted: pods "random-pod" already exists
started to show up from time to time when I run helm install
.
The cause of issue:
The issue is caused if a post-install hook resource (pod, volume…) is not fully deleted after a helm release is deleted/uninstalled. This can happen for example if under certain circumstances, a helm chart needs to be uninstalled and installed again. Even if those hook resources did not get fully deleted, helm will still consider the chart uninstall
as successfull (because the hook resources are technically not part of the release). If the new chart installation finds that the resources from the previous installation are still present, it fails with the error message mentionned above. You can find examples of users complaining from this error in the related issue.
This can happen mostly if the resources (pods, volumes, custom CRDs…) have finalizers or go through some lengthy process before deletion.
To summarize, the error above can arise under the following conditions:
- if you are using a helm hook of type
post-install
orpre-install
- If the resources installed by the hook have finalizers or take time to be cleaned up by Kubernetes
- After a chart is deleted/uninstalled and installed again.
How to reproduce the issue:
First of all, the fix was released in the version 3.14.0 of helm: https://github.com/helm/helm/releases/tag/v3.14.0, so the issue can be reproduced using any version of helm prior to 3.14.0.
The issue can be reproduced using a simple helm chart that uses a post-install
hook with a finalizer. For example, let’s assume we want to create a chart with an nginx deployment. On success, the chart launches a pod that prints something and exits.
file: deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: testing-hooks
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: testing-hooks
template:
metadata:
labels:
app.kubernetes.io/name: testing-hooks
spec:
containers:
- name: testing-hooks
image: "nginx:latest"
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort
file: pod-hook.yaml
apiVersion: v1
kind: Pod
metadata:
name: random-pod
annotations:
helm.sh/hook: "post-install"
helm.sh/hook-delete-policy: before-hook-creation
finalizers:
- kubernetes
spec:
containers:
- name: test
image: "alpine"
command: ['echo']
args:
- "bye bye"
restartPolicy: Never
The chart structutre looks as follow:
├── Chart.yaml
├── templates
│ ├── deployment.yaml
│ ├── _helpers.tpl
│ └── pod-hook.yaml
└── values.yaml
The issue can be reproduced if the following commands are run sequentially:
helm install test-hook .
helm uninstall test-hook
helm install test-hook .
The second install fails with the following message:
Error: failed post-install: warning: Hook post-install testing-hooks-chart/templates/pod.yaml failed: object is being deleted: pods "random-pod" already exists
How does the patch fix the issue:
The patch introduces waiting (with timeout) for hook deletion when re-installing the chart. If a hook resource like the pod in the example have a finalizer or takes time to delete, then the second helm install
will hang until the resource is deleted or the timeout is reached. If the timeout is reached and hook resource is still not deleted, then the chart install fails (depedening on whether --atomic
is set, the other chart resouces will be removed upon failure). For example, if we try to execute the steps mentionned above, the second install will hang for the duration of the timeout (the default is 5m but with the --timeout
flag we can lower it for experimental purposes). Here is the output of helm install test-hook . --debug --timeout 10s
(the second one):
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "random-pod" Pod
wait.go:66: [debug] beginning wait for 1 resources to be deleted with timeout of 10s
Error: INSTALLATION FAILED: failed post-install: context deadline exceeded
helm.go:84: [debug] failed post-install: context deadline exceeded
As you can see, the difference is that now helm waits for the deletion of random-pod
for the specified timeout, and do not fail immediately. In case the resource has a finalizer like the case of the random-pod
here, one can jump into another terminal prompt and remove the finalizer using kubectl edit
.