This procedure is meant as an instructional guide to provide information back to HPE Cray to assist in tuning and troubleshooting exercises.
NOTE: For this example, a ceph-mon process on ncn-s001 is used.
Identify the process and location of the daemon to profile.
ncn-s00(1/2/3)# ceph orch ps --daemon_type mon
Example output:
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
mon.ncn-s001 ncn-s001 running (1h) 60s ago 1h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c bcca26f69191
mon.ncn-s002 ncn-s002 running (1h) 61s ago 1h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 43c8472465b2
mon.ncn-s003 ncn-s003 running (1h) 61s ago 1h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 7aa1b1f19a00
SSH to the node where the process is running if it is different from the current node.
Start the profiler.
ncn-s001# ceph tell mon.ncn-s001 heap start_profiler
A message stating “mon.ncn-s001 started profiler” will be returned.
Dump stats. This does NOT require
the profiler to be running.
ncn-s001# ceph tell mon.ncn-s001 heap stats
Example output:
mon.ncn-s001 tcmalloc heap stats:------------------------------------------------
MALLOC: 972461744 ( 927.4 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 8804424 ( 8.4 MiB) Bytes in central cache freelist
MALLOC: + 3706880 ( 3.5 MiB) Bytes in transfer cache freelist
MALLOC: + 25649416 ( 24.5 MiB) Bytes in thread cache freelists
MALLOC: + 5636096 ( 5.4 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 1016258560 ( 969.2 MiB) Actual memory used (physical + swap)
MALLOC: + 189841408 ( 181.0 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 1206099968 ( 1150.2 MiB) Virtual address space used
MALLOC:
MALLOC: 14833 Spans in use
MALLOC: 25 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
Dump heap. This requires
the profiler to be running.
# ceph tell mon.ncn-s001 heap dump
Example output:
mon.ncn-s001 dumping heap profile now.
------------------------------------------------
MALLOC: 976849264 ( 931.6 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 8819048 ( 8.4 MiB) Bytes in central cache freelist
MALLOC: + 3617280 ( 3.4 MiB) Bytes in transfer cache freelist
MALLOC: + 25531176 ( 24.3 MiB) Bytes in thread cache freelists
MALLOC: + 5636096 ( 5.4 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 1020452864 ( 973.2 MiB) Actual memory used (physical + swap)
MALLOC: + 185647104 ( 177.0 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 1206099968 ( 1150.2 MiB) Virtual address space used
MALLOC:
MALLOC: 14834 Spans in use
MALLOC: 25 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
Release memory.
ncn-s001# ceph tell mon.ncn-s001 heap release
A message stating “mon.ncn-s001 releasing free RAM back to system” will be returned.
Stop the profiler.
ncn-s001# ceph tell mon.ncn-s001 heap stop_profiler
A message stating " mon.ncn-s001 stopped profiler" will be returned.