Cray System Management Documentation > CSM Troubleshooting Information > Cilium Network Troubleshooting Runbook

Cilium Network Troubleshooting Runbook

CSM uses Cilium Container Network Interface (CNI) plugin. The plugin is responsible for assigning IP addresses to pods and establishing network connectivity within and between Kubernetes nodes. This document provides guidance on how to troubleshoot Cilium-related issues in a CSM environment.

Check Cilium status
Inspect Cilium logs
Cilium monitoring
Troubleshooting using Hubble

Check Cilium status

(ncn-mw#) Check the overall health of Cilium components.

Look for any errors or warnings in the output.

cilium status

Expected output resembles the following:

   /¯¯\
/¯¯\__/¯¯\    Cilium:             OK
\__/¯¯\__/    Operator:           OK
/¯¯\__/¯¯\    Envoy DaemonSet:    OK
\__/¯¯\__/    Hubble Relay:       OK
   \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 7, Ready: 7/7, Available: 7/7
DaemonSet              cilium-envoy       Desired: 7, Ready: 7/7, Available: 7/7
Deployment             cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Containers:            cilium             Running: 7
                       cilium-envoy       Running: 7
                       cilium-operator    Running: 1
                       hubble-relay       Running: 1
                       hubble-ui          Running: 1
Cluster Pods:          311/311 managed by Cilium
Helm chart version:    1.16.5
Image versions         cilium             artifactory.algol60.net/csm-docker/stable/quay.io/cilium/cilium:v1.16.5: 7
                       cilium-envoy       artifactory.algol60.net/csm-docker/stable/quay.io/cilium/cilium-envoy:v1.30.8: 7
                       cilium-operator    artifactory.algol60.net/csm-docker/stable/quay.io/cilium/operator-generic:v1.16.5: 1
                       hubble-relay       artifactory.algol60.net/csm-docker/stable/quay.io/cilium/hubble-relay:v1.16.5: 1
                       hubble-ui          artifactory.algol60.net/csm-docker/stable/quay.io/cilium/hubble-ui-backend:v0.13.1: 1
                       hubble-ui          artifactory.algol60.net/csm-docker/stable/quay.io/cilium/hubble-ui:v0.13.1: 1

All the pods should have Running status and the number of Ready pods should match the Desired count. If there are discrepancies, then it may indicate issues with Cilium components. If any errors or warnings are observed, investigate further by checking the logs of the specific Cilium pod or component. Ensure that all Cilium components are running and healthy.

Inspect Cilium logs

Get node-specific status from the Cilium pod.

(ncn-mw#) List all Cilium pods.

kubectl -n kube-system get pod -l k8s-app=cilium -o wide

Expected output resembles the following:

NAME           READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
cilium-9k9wp   1/1     Running   0          17d   10.252.1.9    ncn-m002   <none>           <none>
cilium-dlp8x   1/1     Running   0          17d   10.252.1.11   ncn-w004   <none>           <none>
cilium-ps8gx   1/1     Running   0          17d   10.252.1.8    ncn-m003   <none>           <none>
cilium-t2dn8   1/1     Running   0          17d   10.252.1.6    ncn-w002   <none>           <none>
cilium-tlzqz   1/1     Running   0          17d   10.252.1.7    ncn-w001   <none>           <none>
cilium-xqm8f   1/1     Running   0          17d   10.252.1.5    ncn-w003   <none>           <none>
cilium-zx2d2   1/1     Running   0          17d   10.252.1.10   ncn-m001   <none>           <none>

Any errors or warnings that may indicate issues with network connectivity or policies.

(ncn-mw#) Get the Cilium status for a particular node.

In this example, the pod cilium-dlp8x is running on node ncn-w004, and it is used to check the status of Cilium on that node.

kubectl -n kube-system exec -it cilium-dlp8x -c cilium-agent -- cilium status

Expected output resembles the following:

KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.32 (v1.32.5) [linux/amd64]
Kubernetes APIs:         ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    False
Host firewall:           Disabled
SRv6:                    Disabled
CNI Chaining:            none
CNI Config file:         successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                  Ok   1.16.5 (v1.16.5-ad688277)
NodeMonitor:             Listening for events on 64 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok
IPAM:                    IPv4: 49/254 allocated from 10.32.9.0/24,
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Routing:                 Network: Tunnel [vxlan]   Host: Legacy
Attach Mode:             TCX
Device Mode:             veth
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       258/258 healthy
Proxy Status:            OK, ip 10.32.9.170, 0 redirects active on ports 10000-20000, Envoy: external
Global Identity Range:   min 256, max 65535
Hubble:                  Ok              Current/Max Flows: 4095/4095 (100.00%), Flows/s: 210.25   Metrics: Ok
Encryption:              Disabled
Cluster health:          7/7 reachable   (2025-08-07T21:36:12Z)
Modules Health:          Stopped(0) Degraded(0) OK(236)

Cilium monitoring

Cilium monitoring helps to identify issues with network policies, connectivity, and more. Hubble can also be used for more advanced monitoring and troubleshooting. See the Cilium and Hubble Monitoring documentation for more information.

Troubleshooting using Hubble

(ncn-mw#) To use the Hubble tool to diagnose network problems start a port-forwarding session.
```
cilium hubble port-forward &
```
Expected output resembles the following:
```
[1] 903511
ncn-m001:~ # ℹ️  Hubble Relay is available at 127.0.0.1:4245
```

(ncn-mw#) Use the hubble command to inspect network traffic and events.

This example looks at traffic that is being dropped because of network policies. This command will show the last 5 dropped packets along with their details.

hubble observe --verdict DROPPED --last 5

Example output:

Aug  7 21:40:47.320: fe80::a0ec:7cff:fe45:72b (ID:32748) <> ff02::2 (unknown) Unsupported L3 protocol DROPPED (ICMPv6 RouterSolicitation)
Aug  7 21:40:49.515: sysmgmt-health/vmagent-vms-0-6cbf9d467c-49nxn:39932 (ID:64899) <> dvs/cray-dvs-mqtt-ss-1:15020 (ID:50714) Policy denied DROPPED (TCP Flags: SYN)
Aug  7 21:40:50.084: sysmgmt-health/vmagent-vms-1-5d5dddd445-q9znc:60136 (ID:3549) <> dvs/cray-dvs-mqtt-ss-0:15020 (ID:50714) Policy denied DROPPED (TCP Flags: SYN)
Aug  7 21:40:50.091: sysmgmt-health/vmagent-vms-1-5d5dddd445-jp7gf:58510 (ID:3549) <> dvs/cray-dvs-mqtt-ss-0:15020 (ID:50714) policy-verdict:none INGRESS DENIED (TCP Flags: SYN)

For more information on using the Hubble CLI, see the Hubble CLI documentation.