The unbound DNS instance is used to resolve names for the physical equipment on the management networks within the system, such as NCNs, UANs, switches, and compute nodes. This instance is accessible only within the HPE Cray EX system.
cray-dns-unbound
podscray-dns-unbound
pods(ncn-mw#
) Check the status of the pods:
kubectl get -n services pods | grep unbound
Example output:
cray-dns-unbound-696c58647f-26k4c 2/2 Running 0 121m
cray-dns-unbound-696c58647f-rv8h6 2/2 Running 0 121m
cray-dns-unbound-coredns-q9lbg 0/2 Completed 0 121m
cray-dns-unbound-manager-1596149400-5rqxd 0/2 Completed 0 20h
cray-dns-unbound-manager-1596149400-8ppv4 0/2 Completed 0 20h
cray-dns-unbound-manager-1596149400-cwksv 0/2 Completed 0 20h
cray-dns-unbound-manager-1596149400-dtm9p 0/2 Completed 0 20h
cray-dns-unbound-manager-1596149400-hckmp 0/2 Completed 0 20h
cray-dns-unbound-manager-1596149400-t24w6 0/2 Completed 0 20h
cray-dns-unbound-manager-1596149400-vzxnp 0/2 Completed 0 20h
cray-dns-unbound-manager-1596222000-bcsk7 0/2 Completed 0 2m48s
cray-dns-unbound-manager-1596222060-8pjx6 0/2 Completed 0 118s
cray-dns-unbound-manager-1596222120-hrgbr 0/2 Completed 0 67s
cray-dns-unbound-manager-1596222180-sf46q 1/2 NotReady 0 7s
For more information about the pods displayed in the output above:
cray-dns-unbound-xxx
- These are the main unbound pods.cray-dns-unbound-manager-yyy
- These are job pods that run periodically to update DNS from DHCP (Kea) and the SLS/SMD content for the Hardware State Manager (HSM).
Pods will go into the Completed
status, and then independently be reaped later by Kubernetes.cray-dns-unbound-coredns-zzz
- This pod is run one time during installation of Unbound and reconfigures CoreDNS/ExternalDNS to point to Unbound for all site/internet lookups.The table below describes what the status of each pod means for the health of the cray-dns-unbound
services and pods. The Init
and NotReady
states are not necessarily bad;
they mean that the pod is being started or is processing. The cray-dns-manager
and cray-dns-coredns
pods for cray-dns-unbound
are job pods that run periodically.
Pod | Healthy Status | Error Status | Other |
---|---|---|---|
cray-dns-unbound |
Running |
CrashBackOffLoop |
|
cray-dns-coredns |
Completed |
CrashBackOffLoop |
InitNotReady |
cray-dns-manager |
Completed |
CrashBackOffLoop |
InitNotReady |
(ncn-mw#
) Logs for the Unbound pods will show the status and health of actual DNS lookups.
Any logs with ERROR
or Exception
are an indication that the Unbound service is not healthy.
kubectl logs -n services -l app.kubernetes.io/instance=cray-dns-unbound -c unbound
Example output:
[1596224129] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224129] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224135] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224135] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224140] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224140] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224145] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224145] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224149] unbound[8:0] debug: using localzone health.check.unbound. transparent
[1596224149] unbound[8:0] debug: using localzone health.check.unbound. transparent
...snip...
[1597020669] unbound[8:0] error: error parsing local-data at 33 '69.0.254.10.in-addr.arpa. PTR .local': Empty label
[1597020669] unbound[8:0] error: Bad local-data RR 69.0.254.10.in-addr.arpa. PTR .local
[1597020669] unbound[8:0] fatal error: Could not set up local zones
If there are any errors in the Unbound logs:
using localzone health.check.unbound. transparent
log is not an issue.customization.yaml
file and look at the system_to_site_lookup
values. Ensure that the external lookup values are valid and working.Manager logs will show the status of the latest “true up” of DNS with respect to DHCP actual leases and SLS/SMD status.
(ncn-mw#
) The following command shows the last four lines of the last manager run, and can be adjusted as needed.
kubectl logs -n services pod/$(kubectl get -n services pods | grep unbound | tail -n 1 | cut -f 1 -d ' ') -c manager | tail -n4
Example output:
uid: bc1e8b7f-39e2-49e5-b586-2028953d2940
Comparing new and existing DNS records.
No differences found. Skipping DNS update
Any log with ERROR
or Exception
is an indication that DNS is not healthy. The above example includes one of two possible reports for a healthy manager run.
The healthy states are described below, as long as the write to the ConfigMap has not failed:
No differences found. Skipping DNS update
Differences found. Writing new DNS records to our configmap.
The manager runs periodically, about once every minute. Check if this is a one-time occurrence or if it is a recurring issue.
If any errors discovered in the sections above have been deemed transient or have not been resolved, then restart the Unbound pods.
(ncn-mw#
) Use the following command to restart the pods:
kubectl -n services rollout restart deployment cray-dns-unbound
A rolling restart of the Unbound pods will occur; old pods will not be terminated and new pods will not be added to the load balancer until the new pods have successfully loaded the DNS records.
Unbound stores records it obtains from DHCP, SLS, and SMD via the manager job in a ConfigMap. It is possible to clear this ConfigMap and allow the next manager job to regenerate the content.
This is useful in the following cases:
(ncn-mw#
) The following clears the (DNS Helper) manager generated data in the ConfigMap. This is generally safe as Unbound runtime data is held elsewhere.
kubectl -n services patch configmaps cray-dns-unbound --type merge -p '{"binaryData":{"records.json.gz":"H4sICLQ/Z2AAA3JlY29yZHMuanNvbgCLjuUCAETSaHADAAAA"}}'
Use the following procedure to change the site DNS server that Unbound forwards queries to. This may be necessary if the site DNS server is moved to a different IP address.
(ncn-mw#
) Edit the cray-dns-unbound
ConfigMap.
kubectl -n services edit configmap cray-dns-unbound
Update the forward-zone
value in unbound.conf
.
forward-zone:
name: .
forward-addr: 172.30.84.40
Multiple DNS servers can be defined if required.
forward-zone:
name: .
forward-addr: 172.30.84.40
forward-addr: 192.168.0.1
(ncn-mw#
) Restart cray-dns-unbound
for this change to take effect.
kubectl -n services rollout restart deployment cray-dns-unbound
Example output:
deployment.apps/cray-dns-unbound restarted
(ncn-mw#
) Update customizations.yaml
.
IMPORTANT: If this step is not performed, then the Unbound configuration will be overwritten with the previous value the next time CSM or Unbound is upgraded.
Extract customizations.yaml
from the site-init
secret in the loftsman
namespace.
kubectl -n loftsman get secret site-init -o json | jq -r '.data."customizations.yaml"' | base64 -d > customizations.yaml
Update system_to_site_lookups
with the value of the new DNS server.
spec:
network:
netstaticips:
system_to_site_lookups: 172.30.84.40
If multiple DNS servers are required, add the additional servers into the cray-dns-unbound
service configuration.
spec:
kubernetes:
services:
cray-dns-unbound:
forwardZones:
- name: "."
forwardIps:
- "{{ network.netstaticips.system_to_site_lookups }}"
- "192.168.0.1"
domain_name: '{{ network.dns.external }}'
Update the site-init
secret in the loftsman
namespace.
kubectl delete secret -n loftsman site-init
kubectl create secret -n loftsman generic site-init --from-file=customizations.yaml
On large systems it may be necessary to increase the number of Unbound Pods because of the increased DNS query load.
See Scale cray-dns-unbound
service for more information.