After the CSM upgrade, some nodes with Istio
might not have come up with the new Istio-proxy
image due to too many open files so they need increased fs.inotify.max_user_instances
and fs.inotify.max_user_watches
values.
When pods with istio-proxy
restart (such as after a power outage or node reboot), they may fail due to insufficient inotify
resources, as the limits on the system are too low.
When the issue occurs the following errors are emitted in the istio-proxy
logs.
2024-07-22T17:00:37.322350Z info Workload SDS socket not found. Starting Istio SDS Server
2024-07-22T17:00:37.322393Z info CA Endpoint istiod.istio-system.svc:15012, provider Citadel
2024-07-22T17:00:37.322395Z info Opening status port 15020
2024-07-22T17:00:37.322436Z info Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2024-07-22T17:00:37.323487Z error failed to start SDS server: failed to start workload secret manager too many open files
Error: failed to start SDS server: failed to start workload secret manager too many open files
This issue manifests when:
inotify
instances to monitor required files.This problem can be triggered by events like:
Manually increase the fs.inotify.max_user_instances
and fs.inotify.max_user_watches
values to provide sufficient resources for Istio and other Kubernetes components.
pdsh -w ncn-m00[1-3],ncn-w00[1-5] 'sysctl -w fs.inotify.max_user_instances=1024'
pdsh -w ncn-m00[1-3],ncn-w00[1-5] 'sysctl -w user.max_inotify_instances=1024'
pdsh -w ncn-m00[1-3],ncn-w00[1-5] 'sysctl -w fs.inotify.max_user_watches=1048576'
pdsh -w ncn-m00[1-3],ncn-w00[1-5] 'sysctl -w user.max_inotify_watches=1048576'