The Ceph orchestrator provides a centralized interface for the management of the Ceph cluster. It orchestrates ceph-mgr modules that interface with external orchestration services.
Refer to the external Ceph documentation for more information.
The orchestrator manages Ceph clusters with the following capabilities:
cephadm
Log MessagesWatching log messages is useful when making changes with the orchestrator, such as add/remove/scale services or upgrades.
ceph -w cephadm
To watch log messages with debug:
ceph config set mgr mgr/cephadm/log_to_cluster_level debug
ceph -W cephadm --watch-debug
NOTE
For use with orchestration tasks, this can be typically run from a node running the ceph mon process. In most cases, this is ncn-s00(1/2/3). There may be cases where a cephadm is run locally on a host and it will be more efficient to tail/var/log/ceph/cephadm.log
.
This section will provide some in-depth usage with examples of the more commonly used ceph orch
subcommands.
ceph orch ls
Example output:
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
alertmanager 1/1 6m ago 4h count:1 registry.local/prometheus/alertmanager:v0.20.0 0881eb8f169f
crash 3/3 6m ago 4h * registry.local/ceph/ceph:v15.2.8 5553b0cb212c
grafana 1/1 6m ago 4h count:1 registry.local/ceph/ceph-grafana:6.6.2 a0dce381714a
mds.cephfs 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
mgr 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
mon 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
node-exporter 3/3 6m ago 4h * registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf
osd.all-available-devices 9/9 6m ago 4h * registry.local/ceph/ceph:v15.2.8 5553b0cb212c
prometheus 1/1 6m ago 4h count:1 docker.io/prom/prometheus:v2.18.1 de242295e225
rgw.site1 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
FILTERS:
Apply filters by adding --service_type <service type>
or --service_name <service name>
.
Reference Key:
ceph orch ps
Example output:
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
alertmanager.ncn-s001 ncn-s001 running (5h) 5m ago 5h 0.20.0 registry.local/prometheus/alertmanager:v0.20.0 0881eb8f169f 0e6a24469465
crash.ncn-s001 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c b6a582ed7573
crash.ncn-s002 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 3778e29099eb
crash.ncn-s003 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c fe085e310cbd
grafana.ncn-s001 ncn-s001 running (5h) 5m ago 5h 6.6.2 registry.local/ceph/ceph-grafana:6.6.2 a0dce381714a 2fabb486928c
mds.cephfs.ncn-s001.qrxkih ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 03a3a1ce682e
mds.cephfs.ncn-s002.qhferv ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 56dca5cca407
mds.cephfs.ncn-s003.ihwkop ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 38ab6a6c8bc6
mgr.ncn-s001.vkfdue ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 456587705eab
mgr.ncn-s002.wjaxkl ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 48222c38dd7e
mgr.ncn-s003.inwpij ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 76ff8e485504
mon.ncn-s001 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c bcca26f69191
mon.ncn-s002 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 43c8472465b2
mon.ncn-s003 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 7aa1b1f19a00
node-exporter.ncn-s001 ncn-s001 running (5h) 5m ago 5h 0.18.1 registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 0be431766c8e
node-exporter.ncn-s002 ncn-s002 running (5h) 5m ago 5h 0.18.1 registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 6ae81d01d963
node-exporter.ncn-s003 ncn-s003 running (5h) 5m ago 5h 0.18.1 registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 330dc09d0845
osd.0 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c a8c7314b484b
osd.1 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 8f9941887053
osd.2 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 49cf2c532efb
osd.3 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 69e89cf18216
osd.4 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 72d7f51a3690
osd.5 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 76d598c40824
osd.6 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c d2372e45c8eb
osd.7 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 5bd22f1d4cad
osd.8 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 7c5282f2e107
prometheus.ncn-s001 ncn-s001 running (5h) 5m ago 5h 2.18.1 docker.io/prom/prometheus:v2.18.1 de242295e225 bf941a1306e9
rgw.site1.ncn-s001.qegfux ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c e833fc05acfe
rgw.site1.ncn-s002.wqrzoa ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 83a131a7022c
rgw.site1.ncn-s003.tzkxya ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c c67d75adc620
FILTERS:
Apply filters by adding any or all of [–hostname <hostname> –service_name <service_name> –daemon_type <daemon_type> –daemon_id <daemon_id>].
NOTE
The service name is fromceph orch ps
NOTceph orch ls
.
ceph orch daemon restart alertmanager.ncn-s001
A message stating “Scheduled to restart alertmanager.ncn-s001 on host ’ncn-s001’” will be returned.
Monitor the restart using the ceph orch ps
command and the time associated with the STATUS
should be reset and show “running (time since started).”
NOTE
The service name is fromceph orch ls
NOTceph orch ps
.
ceph orch apply alertmanager --placement="2 ncn-s001 ncn-s002"
A message stating “Scheduled alertmanager update…” will be returned.
Reference Key:
--placement="2"
, then it will automatically pick where to put it.IMPORTANT: There are several combinations available when working with the placement. For example, a placement of 1 can be specified, but then a list of a sub-set of nodes can be used. This is a good way to contain the process to those nodes. IMPORTANT: This is not available for any deployments with a
PLACEMENT
of *
ceph orch host ls
Example output:
HOST ADDR LABELS STATUS
ncn-s001 ncn-s001
ncn-s002 ncn-s002
ncn-s003 ncn-s003
ceph orch device ls
Example output:
Hostname Path Type Serial Size Health Ident Fault Available
ncn-s001 /dev/vdb hdd fb794832-f402-4f4f-a 107G Unknown N/A N/A No
ncn-s001 /dev/vdc hdd 9bdef369-6bac-40ca-a 107G Unknown N/A N/A No
ncn-s001 /dev/vdd hdd 3cda8ba2-ccaf-4515-b 107G Unknown N/A N/A No
ncn-s002 /dev/vdb hdd 775639a6-092e-4f3a-9 107G Unknown N/A N/A No
ncn-s002 /dev/vdc hdd 261e8a40-2349-484e-8 107G Unknown N/A N/A No
ncn-s002 /dev/vdd hdd 8f01f9c6-2c6c-449c-a 107G Unknown N/A N/A No
ncn-s003 /dev/vdb hdd 46467f02-1d11-44b2-b 107G Unknown N/A N/A No
ncn-s003 /dev/vdc hdd 4797e919-667e-4376-b 107G Unknown N/A N/A No
ncn-s003 /dev/vdd hdd 3b2c090d-37a0-403b-a 107G Unknown N/A N/A No
IMPORTANT: If
--wide
is used, it will give the reasons a drive is notAvailable
. This DOES NOT mean something is wrong. If Ceph already has the drive provisioned, there may be similar reasons.
Update the size or placement for a service or apply a large YAML spec:
ceph orch apply [mon|mgr|rbd-mirror|crash|alertmanager|grafana|node-exporter|prometheus] [<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Scale an iSCSI service:
ceph orch apply iscsi <pool> <api_user> <api_password> [<trusted_ip_list>][<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Update the number of MDS instances for the given fs_name:
ceph orch apply mds <fs_name> [<placement>] [--dry-run] [--unmanaged] [plain|json|json-pretty|yaml]
Scale an NFS service:
ceph orch apply nfs <svc_id> <pool> [<namespace>] [<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Create OSD daemon(s) using a drive group spec:
ceph orch apply osd [--all-available-devices] [--dry-run] [--unmanaged] [plain|json|json-pretty|yaml]
Update the number of RGW instances for the given zone:
ceph orch apply rgw <realm_name> <zone_name> [<subcluster>] [<port:int>] [--ssl] [<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Cancel ongoing operations:
ceph orch cancel
Add daemon(s):
ceph orch daemon add [mon|mgr|rbd-mirror|crash|alertmanager|grafana|node-exporter|prometheus] [<placement>]
Start iscsi daemon(s):
ceph orch daemon add iscsi <pool> <api_user> <api_password> [<trusted_ip_list>] [<placement>]
Start MDS daemon(s):
ceph orch daemon add mds <fs_name> [<placement>]
Start NFS daemon(s):
ceph orch daemon add nfs <svc_id> <pool> [<namespace>] [<placement>]
Create an OSD service:
Either –svc_arg=host:drives
ceph orch daemon add osd [<svc_arg>]
Start RGW daemon(s):
ceph orch daemon add rgw <realm_name> <zone_name> [<subcluster>] [<port:int>] [--ssl] [<placement>]
Redeploy a daemon (with a specific image):
ceph orch daemon redeploy <name> [<image>]
Remove specific daemon(s):
ceph orch daemon rm <names>... [--force]
Start, stop, restart, (redeploy,) or reconfig a specific daemon:
ceph orch daemon start|stop|restart|reconfig <name>
List devices on a host:
ceph orch device ls [<hostname>...] [plain|json|json-pretty|yaml] [--refresh] [--wide]
Zap (erase!) a device so it can be re-used:
ceph orch device zap <hostname> <path> [--force]
Add a host:
ceph orch host add <hostname> [<addr>] [<labels>...]
Add a host label:
ceph orch host label add <hostname> <label>
Remove a host label:
ceph orch host label rm <hostname> <label>
List hosts:
ceph orch host ls [plain|json|json-pretty|yaml]
Check if the specified host can be safely stopped without reducing availability:
ceph orch host ok-to-stop <hostname>
Remove a host:
ceph orch host rm <hostname>
Update a host address:
ceph orch host set-addr <hostname> <addr>
List services known to orchestrator:
ceph orch ls [<service_type>] [<service_name>] [--export] [plain|json|json-pretty|yaml] [--refresh]
Remove OSD services:
ceph orch osd rm <svc_id>... [--replace] [--force]
Status of OSD removal operation:
ceph orch osd rm status [plain|json|json-pretty|yaml]
Remove OSD services:
ceph orch osd rm stop <svc_id>...
Pause orchestrator background work:
ceph orch pause
List daemons known to orchestrator:
ceph orch ps [<hostname>] [<service_name>] [<daemon_type>] [<daemon_id>] [plain|json|json-pretty|yaml] [--refresh]
Resume orchestrator background work (if paused):
ceph orch resume
Remove a service:
ceph orch rm <service_name> [--force]
Select orchestrator module backend:
ceph orch set backend <module_name>
Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons):
ceph orch start|stop|restart|redeploy|reconfig <service_name>
Report configured backend and its status:
ceph orch status [plain|json|json-pretty|yaml]
Check service versions vs available and target containers:
ceph orch upgrade check [<image>] [<ceph_version>]
Pause an in-progress upgrade:
ceph orch upgrade pause
Resume paused upgrade:
ceph orch upgrade resume
Initiate upgrade:
ceph orch upgrade start [<image>] [<ceph_version>]
Check service versions vs available and target containers:
ceph orch upgrade status
Stop an in-progress upgrade:
ceph orch upgrade stop