You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, we have noticed that node is not properly drained during update. Update operator doesn't wait until all pods on node are evicted and reboots node immediately which leads to service interruption. The eviction of pods is probably not started at all.
Operator logs could not drain with error User \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"kube-system\" followed by proceeding anyway , see more details below.
Update operator log during reboot:
2021-09-18T15:53:16.000Z bottlerocket-update-operator controller--b4c55546b-sp4br time="2021-09-18T15:53:15Z" level=error msg="could not drain" component=controller error="[cannot delete daemonsets.apps \"kube-proxy\" is forbidden: User \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"kube-system\": kube-system/kube-proxy-n2hzd, cannot delete daemonsets.apps \"update-operator-agent-update-api\" is forbidden: User \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"bottlerocket\": bottlerocket/update-operator-agent-update-api-pmv6k, cannot delete daemonsets.apps \"datadog-agent\" is forbidden: User \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"datadog\": datadog/datadog-agent-cqx69, cannot delete daemonsets.apps \"calico-node\" is forbidden: User \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"kube-system\": kube-system/calico-node-s8vrr, cannot delete daemonsets.apps \"fluentd-papertrail-containerd\" is forbidden: User \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"kube-system\": kube-system/fluentd-papertrail-containerd-g29tm]" intent="reboot-update,perform-update,ready update:true" node=ip-10-233-157-101.eu-west-1.compute.internal worker=manager
2021-09-18T15:53:16.000Z bottlerocket-update-operator controller--b4c55546b-sp4br time="2021-09-18T15:53:15Z" level=warning msg="proceeding anyway" component=controller intent="reboot-update,perform-update,ready update:true" node=ip-10-233-157-101.eu-west-1.compute.internal worker=manager
If I add permissions to update controller:
- verbs:
- get
- list
apiGroups:
- apps
resources:
- daemonsets
- replicasets
Following error is logged cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore):
Thanks for opening this report. It seems like the operator doesn't handle error responses from the Drain API as one might expect. I'll look into better handling this case.
To clarify, in this case are you anticipating that the operator should drain the DaemonSet pod, or would you rather it ignore DaemonSet pods and wait for the rest to be drained?
I would like to set the operator to ignore DaemonSet pods, same as kubectl drain ip-10-233-156-21.eu-west-1.compute.internal --delete-local-data --ignore-daemonsets --force
Image I'm using:
328549459982.dkr.ecr.eu-west-1.amazonaws.com/bottlerocket-update-operator:v0.1.4
Deployment manifest:
Node info:
Issue or Feature Request:
Hello, we have noticed that node is not properly drained during update. Update operator doesn't wait until all pods on node are evicted and reboots node immediately which leads to service interruption. The eviction of pods is probably not started at all.
Operator logs
could not drain
with errorUser \"system:serviceaccount:bottlerocket:update-operator-controller\" cannot get resource \"daemonsets\" in API group \"apps\" in the namespace \"kube-system\"
followed byproceeding anyway
, see more details below.Update operator log during reboot:
If I add permissions to update controller:
Following error is logged
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore)
:Can I somehow configure deamonsets ignore on drain in update operator?
thanks
The text was updated successfully, but these errors were encountered: