There is a known issue where the Keycloak configuration obtained from LDAP is incomplete causing the keycloak-users-localize
job to fail to complete.
This, in turn, causes 403 Forbidden errors when trying to use the cray
CLI.
This can also cause a Keycloak test to fail during CSM health validation.
To recover from this situation, the following can be done.
Log into the Keycloak admin console. See Access the Keycloak User Management UI
Delete the shasta-user-federation-ldap
entry from the “User Federation” page.
Wait three minutes for the configuration to re-sync.
Re-run the Keycloak localize job.
kubectl get job -n services -l app.kubernetes.io/name=cray-keycloak-users-localize \
-ojson | jq '.items[0]' > keycloak-users-localize-job.json
kubectl delete job -n services -l app.kubernetes.io/name=cray-keycloak-users-localize
cat keycloak-users-localize-job.json | jq 'del(.spec.selector)' | \
jq 'del(.spec.template.metadata.labels)' | kubectl apply -f -
Expected output looks similar to:
job.batch "keycloak-users-localize-1" deleted
job.batch/keycloak-users-localize-1 created
Check to see if the keycloak-users-localize
job has completed.
kubectl -n services wait --for=condition=complete --timeout=10s job/`kubectl -n services get jobs | grep users-localize | awk '{print $1}'`
If the above command returns output containing condition met
then the issue is resolved and you can skip the rest of the steps.
If the above command returns output containing error: timed out waiting for the condition
then check the logs of the keycloak-users-localize
pod.
kubectl -n services logs `kubectl -n services get pods | grep users-localize | awk '{print $1}'` keycloak-localize
If you see an error showing that there is a duplicate group, complete the next step.
Go to the Groups page in the Keycloak admin console and delete the groups.
If you see an error saying there was a KeyError: 'gidNumber'
or KeyError: 'cn'
, complete the next steps.
Get the groups missing some attributes (ncn-m#
):
IP=$(kubectl get service/keycloak -n services -o json | jq -r '.spec.clusterIP')
ADMIN_SECRET=$(kubectl get secret -n services keycloak-master-admin-auth --template={{.data.password}} | base64 -d)
TOKEN=$(curl -s http://$IP:8080/keycloak/realms/master/protocol/openid-connect/token \
-d grant_type=password \
-d client_id=admin-cli \
-d username=admin \
--data-urlencode password=$ADMIN_SECRET \
| jq -r '.access_token')
curl -s -H "Authorization: Bearer $TOKEN" http://$IP:8080/keycloak/admin/realms/shasta/groups?briefRepresentation=false \
| jq '.[] | select(.attributes.cn[0] == null or .attributes.gidNumber[0] == null)'
Go to the Groups page in the Keycloak admin console
Search and select the group that is missing some attribute
Click on the attribute tab
Add a new attribute named ‘cn’ with a value of the group name
Add a second new attribute named ‘gidNumber’ with a random number over 1000000001 and under 4000000000
Repeat for all groups missing attributes
Wait three minutes for the configuration to re-sync.
Re-run the Keycloak localize job.
kubectl get job -n services -l app.kubernetes.io/name=cray-keycloak-users-localize \
-ojson | jq '.items[0]' > keycloak-users-localize-job.json
kubectl delete job -n services -l app.kubernetes.io/name=cray-keycloak-users-localize
cat keycloak-users-localize-job.json | jq 'del(.spec.selector)' | \
jq 'del(.spec.template.metadata.labels)' | kubectl apply -f -
Expected output looks similar to:
job.batch "keycloak-users-localize-1" deleted
job.batch/keycloak-users-localize-1 created
Check again to make sure the job has now completed.
kubectl -n services wait --for=condition=complete --timeout=10s job/`kubectl -n services get jobs | grep users-localize | awk '{print $1}'`
You should see output containing condition met
.