Troubleshoot S3FS Cache Cleanup

This procedure describes how to manually clean up S3FS cache when the automatic pruning is insufficient and the cache continues to grow, consuming excessive disk space on worker nodes.

Background

(ncn-w#) CSM includes an automatic S3FS cache pruning mechanism in the form of a daily cron job:

cat /etc/cron.d/prune-s3fs-boot-images-cache

Output:

0 0 * * * root /usr/bin/prune-s3fs-cache.sh boot-images /var/lib/s3fs_cache 161061273600 -silent

However, some files may not be properly pruned by the automated process. Over time, this may lead to increasing disk usage, which requires manual intervention.

Symptoms

  • High disk usage on worker nodes
  • S3FS cache directories growing continuously despite automatic pruning
  • Files accumulating in /var/lib/s3fs_cache/ subdirectories

Example of problematic cache growth:

ncn-w001:~ # cd /var/lib/s3fs_cache/
ncn-w001:/var/lib/s3fs_cache # ls -la
total 36
drwxr-xr-x 6 root root 4096 Nov 29 08:02 .
drwxr-xr-x 1 root root 4096 Nov 29 08:16 ..
drwxr-xr-x 39 root root 4096 Mar 5 11:40 boot-images
drwxr-xr-x 2 root root 4096 Mar 12 11:56 .boot-images.mirror
drwxr-xr-x 39 root root 4096 Mar 5 11:40 .boot-images.stat
drwx------ 2 root root 16384 Nov 29 08:43 lost+found

ncn-w001:/var/lib/s3fs_cache # du -sh boot-images/ .boot-images.mirror/
67G boot-images/
42G .boot-images.mirror/

Prerequisites

  • Root access to the affected worker nodes
  • Understanding that S3FS cache can be safely deleted as it will be rebuilt on demand

Procedure

Manual cache cleanup

Perform the following steps on each affected worker node:

  1. (ncn-w#) Log in to the worker node and navigate to the S3FS cache directory:

    cd /var/lib/s3fs_cache/
    
  2. (ncn-w#) Check current disk usage:

    df -h /var/lib/s3fs_cache/
    du -sh *
    
  3. (ncn-w#) Clean up files older than 30 days in the main cache directory:

    cd /var/lib/s3fs_cache/boot-images/
    find . -atime +30 -type f | xargs rm -vf
    
  4. (ncn-w#) Clean up files in the mirror directory:

    cd /var/lib/s3fs_cache/.boot-images.mirror/
    find . -atime +30 -type f | xargs rm -vf
    
  5. (ncn-w#) Clean up files in the stat directory:

    cd /var/lib/s3fs_cache/.boot-images.stat/
    find . -atime +30 -type f | xargs rm -vf
    
  6. (ncn-w#) Remove empty directories:

    cd /var/lib/s3fs_cache/
    find . -type d -empty -delete
    
  7. (ncn-w#) Check the disk usage after cleanup:

    df -h /var/lib/s3fs_cache/
    

Alternative cleanup methods

More aggressive cleanup

In order to clean up files older than a different time period, adjust the -atime parameter.

(ncn-w#) For example:

# Clean up files older than 7 days
find . -atime +7 -type f | xargs rm -vf

# Clean up files older than 1 day
find . -atime +1 -type f | xargs rm -vf

Complete cache reset

(ncn-w#) If the cache is severely corrupted or if wishing to start fresh, then perform a complete cache reset.

Warning: This will remove all cached data, which may cause a temporary performance impact as the cache rebuilds.

cd /var/lib/s3fs_cache/
rm -rf boot-images/ .boot-images.mirror/ .boot-images.stat/

Important notes

  • Safe to delete: S3FS cache files can be safely deleted at any time as they are rebuilt on demand
  • Performance impact: Deleting cache may cause temporary performance degradation as data is re-cached
  • Regular maintenance: Consider implementing regular manual cleanup if automatic pruning proves insufficient
  • Monitoring: Set up alerts for disk usage on worker nodes to catch cache growth early