Some images require the building and installation of kernel drivers using the Dynamic Kernel Module Support (DKMS) tool. This allows kernel modules to be built for the specific kernel used in the image. The DKMS tool requires access to the running kernel that is not usually allowed by the Image Management Service (IMS). In order to safely allow the expanded access, the IMS configuration must be modified to enable the feature.
Many DKMS build and install scripts require access to the system /proc
, /dev
, and /sys
directories which
allows access to running processes and system services. The IMS jobs run as an administrator user since preparing
images requires root access to work properly. Allowing root access to the running system would allow an
unacceptable security vulnerability to the Kubernetes worker node the job is running on.
To address the security concerns, but also allow the DKMS tool to install kernel modules in image customization, a Kata Virtual Machine (VM) is used. When DKMS is enabled in IMS, the jobs are modified to run inside a Kata VM. The DKMS tool then has enhanced access to the running Kata VM kernel, but is unable to interact directly with the Kubernetes worker the job is running on.
It is required that Kubernetes be configured with Kata. That should be part of the standard NCN worker configuration, so documentation on how to do that is outside the scope of the IMS documentation.
**NOTE: Since the IMS job is running inside a VM, there will be a performance impact on the runtime of the job but this is required to provide a secure environment.
The following steps will enable DKMS operation for all IMS jobs including those controlled by the Configuration Management Service (CFS). It will remain in this configuration until manually reverted back to disabling the DKMS operation.
(ncn-mw#
) Check which Kata runtime class is installed.
kubectl get runtimeclass
Expected output is something like:
NAME HANDLER AGE
kata-qemu kata-qemu 64d
Make note of the kata configuration to use for the IMS jobs.
**NOTE: if there are no kata runtime classes returned by the above step, then Kata must be configured on the system. Instructions for that are beyond the scope of the IMS documentation.
(ncn-mw#
) Edit the ims-config
Kubernetes configuration map to enable DKMS.
kubectl -n services edit cm ims-config
Look for the lines:
JOB_ENABLE_DKMS: "False"
JOB_KATA_RUNTIME: kata-qemu
Change the value for JOB_ENABLE_DKMS
to True
. If the Kata runtime class on the system is not
kata-qemu
then change the JOB_KATA_RUNTIME
to the desired configuration:
JOB_ENABLE_DKMS: "True"
JOB_KATA_RUNTIME: kata-qemu
Exit editing the configmap, saving the new values.
(ncn-mw#
) Restart the IMS pod to pick up the new ConfigMap values.
Find the current cray-ims
pod:
kubectl -n services get pods | grep ims
Expected output will look something like:
cray-ims-bc875d949-fffk6 2/2 Running 0 4h29m
ims-post-upgrade-gkf4t 0/2 Completed 0 2d3h
Delete the running pod:
kubectl -n services delete pod cray-ims-bc875d949-fffk6
Then wait until the new pod is in the 2/2 Running
status. New IMS jobs will be created in
Kata VMs with enhanced kernel access.
To revert the settings so the IMS jobs no longer run inside a Kata VM with the enhanced kernel
access change the ims-config
setting back to False
and restart the cray-ims
pod again.
(ncn-mw#
) Edit the ims-config
Kubernetes configuration map to disable DKMS.
kubectl -n services edit cm ims-config
Look for the lines:
JOB_ENABLE_DKMS: "True"
JOB_KATA_RUNTIME: kata-qemu
Change the value for JOB_ENABLE_DKMS
to False
. The variableJOB_KATA_RUNTIME
is not used when
under this scenario so its value does not matter.
JOB_ENABLE_DKMS: "False"
JOB_KATA_RUNTIME: kata-qemu
Exit editing the configmap, saving the new values.
(ncn-mw#
) Restart the IMS pod to pick up the new ConfigMap values.
Find the current cray-ims
pod:
kubectl -n services get pods | grep ims
Expected output will look something like:
cray-ims-bc875d949-64fc1 2/2 Running 0 4h29m
ims-post-upgrade-gkf4t 0/2 Completed 0 2d3h
Delete the running pod:
kubectl -n services delete pod cray-ims-bc875d949-64fc1
Then wait until the new pod is in the 2/2 Running
status. Now new IMS jobs will be started running
directly on the Kubernetes node and without the enhanced kernel access.
There is a data field for each recipe stored with IMS that can set if that particular recipe requires DKMS to be enabled to built successfully. If this is set to ‘True’ it will override the global DKMS setting described above.
To set the dkms_required
field for a particular recipe:
(ncn-mw#
) Set a variable with the IMS Recipe ID in the environment:
IMS_RECIPE_ID=2233c82a-5081-4f67-bec4-4b59a60017a6
(ncn-mw#
) Look at the current recipe record:
cray ims recipes describe $IMS_RECIPE_ID
Expected output:
{
"arch": "x86_64",
"created": "2023-06-20T08:01:22.819146+00:00",
"id": "c66f130c-c7c6-46b4-bb58-3fc17f08929f",
"link": {
"etag": "",
"path": "s3://ims/recipes/c66f130c-c7c6-46b4-bb58-3fc17f08929f/myrecipe20June2023.tgz",
"type": "s3"
},
"linux_distribution": "sles15",
"name": "myrecipe20June2023",
"recipe_type": "kiwi-ng",
"require_dkms": false,
"template_dictionary": []
}
(ncn-mw#
) Change the value of require_dkms
for the recipe:
cray ims recipes update --require-dkms true $IMS_RECIPE_ID
Expected output:
{
"arch": "x86_64",
"created": "2023-06-20T08:01:22.819146+00:00",
"id": "c66f130c-c7c6-46b4-bb58-3fc17f08929f",
"link": {
"etag": "",
"path": "s3://ims/recipes/c66f130c-c7c6-46b4-bb58-3fc17f08929f/myrecipe20June2023.tgz",
"type": "s3"
},
"linux_distribution": "sles15",
"name": "myrecipe20June2023",
"recipe_type": "kiwi-ng",
"require_dkms": true,
"template_dictionary": []
}
The call to create a new job in IMS has a require-dkms
field that will override the global and
recipe setting. If a value is passed in directly it will always take precedence when the job is
created.
(ncn-mw#
) Use the require-dkms
option when creating a recipe build job:
cray ims jobs create \
--job-type create \
--image-root-archive-name cray-sles15-barebones \
--artifact-id $IMS_RECIPE_ID \
--public-key-id $IMS_PUBLIC_KEY_ID \
--enable-debug False \
--require-dkms True
Example output:
status = "creating"
enable_debug = false
kernel_file_name = "vmlinuz"
artifact_id = "2233c82a-5081-4f67-bec4-4b59a60017a6"
build_env_size = 10
job_type = "create"
kubernetes_service = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-service"
kubernetes_job = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-create"
id = "ad5163d2-398d-4e93-94f0-2f439f114fe7"
image_root_archive_name = "cray-sles15-barebones"
initrd_file_name = "initrd"
arch = "x86_64"
require_dkms = true
created = "2018-11-21T18:22:53.409405+00:00"
public_key_id = "a252ff6f-c087-4093-a305-122b41824a3e"
kubernetes_configmap = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-configmap"
(ncn-mw#
) Use the require-dkms
option when creating an image customization job:
cray ims jobs create \
--job-type customize \
--image-root-archive-name cray-sles15-barebones \
--artifact-id $IMS_IMAGE_ID \
--public-key-id $IMS_PUBLIC_KEY_ID \
--enable-debug False \
--require-dkms True
Example output:
status = "creating"
enable_debug = false
kernel_file_name = "vmlinuz"
artifact_id = "2233c82a-5081-4f67-bec4-4b59a60017a6"
build_env_size = 10
job_type = "customize"
kubernetes_service = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-service"
kubernetes_job = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-create"
id = "ad5163d2-398d-4e93-94f0-2f439f114fe7"
image_root_archive_name = "cray-sles15-barebones"
initrd_file_name = "initrd"
arch = "x86_64"
require_dkms = true
created = "2018-11-21T18:22:53.409405+00:00"
public_key_id = "a252ff6f-c087-4093-a305-122b41824a3e"
kubernetes_configmap = "cray-ims-ad5163d2-398d-4e93-94f0-2f439f114fe7-configmap"