“Is a module that provides a command line interface (CLI) to orchestrator modules (ceph-mgr modules which interface with external orchestration services).” - source (https://docs.ceph.com/en/latest/mgr/orchestrator/)
This provides a nice centralized interface for the management of the ceph cluster. This includes:
cephadm
Log MessagesThis is useful when making changes via the orchestrator like add/remove/scale services or upgrades.
ncn-s# ceph -w cephadm
With Debug
ncn-s# ceph config set mgr mgr/cephadm/log_to_cluster_level debug
ncn-s# ceph -W cephadm --watch-debug
Note:
For use with orchestration tasks, this can be typically run from a node running the ceph mon process (typically ncn-s00(1/2/3)). There may be cases where you are running a cephadm locally on a host and it will be more efficient to tail /var/log/ceph/cephadm.log.
This section will provide some in-depth usage with examples of the more commonly used ceph orch
subcommands.
List Service Deployments
ncn-s# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
alertmanager 1/1 6m ago 4h count:1 registry.local/prometheus/alertmanager:v0.20.0 0881eb8f169f
crash 3/3 6m ago 4h * registry.local/ceph/ceph:v15.2.8 5553b0cb212c
grafana 1/1 6m ago 4h count:1 registry.local/ceph/ceph-grafana:6.6.2 a0dce381714a
mds.cephfs 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
mgr 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
mon 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
node-exporter 3/3 6m ago 4h * registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf
osd.all-available-devices 9/9 6m ago 4h * registry.local/ceph/ceph:v15.2.8 5553b0cb212c
prometheus 1/1 6m ago 4h count:1 docker.io/prom/prometheus:v2.18.1 de242295e225
rgw.site1.zone1 3/3 6m ago 4h ncn-s001;ncn-s002;ncn-s003;count:3 registry.local/ceph/ceph:v15.2.8 5553b0cb212c
FILTERS:
You can apply filters by adding --service_type <service type>
or --service_name <service name>
.
Reference Key
List Deployed Daemons
ncn-s# ceph orch ps
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
alertmanager.ncn-s001 ncn-s001 running (5h) 5m ago 5h 0.20.0 registry.local/prometheus/alertmanager:v0.20.0 0881eb8f169f 0e6a24469465
crash.ncn-s001 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c b6a582ed7573
crash.ncn-s002 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 3778e29099eb
crash.ncn-s003 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c fe085e310cbd
grafana.ncn-s001 ncn-s001 running (5h) 5m ago 5h 6.6.2 registry.local/ceph/ceph-grafana:6.6.2 a0dce381714a 2fabb486928c
mds.cephfs.ncn-s001.qrxkih ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 03a3a1ce682e
mds.cephfs.ncn-s002.qhferv ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 56dca5cca407
mds.cephfs.ncn-s003.ihwkop ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 38ab6a6c8bc6
mgr.ncn-s001.vkfdue ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 456587705eab
mgr.ncn-s002.wjaxkl ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 48222c38dd7e
mgr.ncn-s003.inwpij ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 76ff8e485504
mon.ncn-s001 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c bcca26f69191
mon.ncn-s002 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 43c8472465b2
mon.ncn-s003 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 7aa1b1f19a00
node-exporter.ncn-s001 ncn-s001 running (5h) 5m ago 5h 0.18.1 registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 0be431766c8e
node-exporter.ncn-s002 ncn-s002 running (5h) 5m ago 5h 0.18.1 registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 6ae81d01d963
node-exporter.ncn-s003 ncn-s003 running (5h) 5m ago 5h 0.18.1 registry.local/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 330dc09d0845
osd.0 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c a8c7314b484b
osd.1 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 8f9941887053
osd.2 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 49cf2c532efb
osd.3 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 69e89cf18216
osd.4 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 72d7f51a3690
osd.5 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 76d598c40824
osd.6 ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c d2372e45c8eb
osd.7 ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 5bd22f1d4cad
osd.8 ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 7c5282f2e107
prometheus.ncn-s001 ncn-s001 running (5h) 5m ago 5h 2.18.1 docker.io/prom/prometheus:v2.18.1 de242295e225 bf941a1306e9
rgw.site1.zone1.ncn-s001.qegfux ncn-s001 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c e833fc05acfe
rgw.site1.zone1.ncn-s002.wqrzoa ncn-s002 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c 83a131a7022c
rgw.site1.zone1.ncn-s003.tzkxya ncn-s003 running (5h) 5m ago 5h 15.2.8 registry.local/ceph/ceph:v15.2.8 5553b0cb212c c67d75adc620
FILTERS:
You can apply filters by adding any or all of [–hostname
Ceph daemon start|stop|restart|reconfig
NOTE:
The service name is from ceph orch ps
NOT ceph orch ls
.
ncn-s# ceph orch daemon restart alertmanager.ncn-s001
Scheduled to restart alertmanager.ncn-s001 on host 'ncn-s001'
Additional Task:
You should monitor the restart using the ceph orch ps
command and the time associated with the STATUS
should be reset and show “running (time since started).”
Deploy or Scale services
NOTE:
The service name is from ceph orch ls
NOT ceph orch ps
.
ncn-s# ceph orch apply alertmanager --placement="2 ncn-s001 ncn-s002"
Scheduled alertmanager update...
Reference Key
--placement="2"
then it will automatically pick where to put it.IMPORTANT:
When working with the placement, you can have several combinations. For example you can specify a placement of 1 but list a sub set of your nodes. This is a good way to contain the process to those nodes.
IMPORTANT:
This is not available for any deployments with a PLACEMENT
of *
List Hosts Known to Ceph Orchestrator
ncn-s# ceph orch host ls
HOST ADDR LABELS STATUS
ncn-s001 ncn-s001
ncn-s002 ncn-s002
ncn-s003 ncn-s003
List Drives on Hosts Known to Ceph Orchestrator
ncn-s# ceph orch device ls
Hostname Path Type Serial Size Health Ident Fault Available
ncn-s001 /dev/vdb hdd fb794832-f402-4f4f-a 107G Unknown N/A N/A No
ncn-s001 /dev/vdc hdd 9bdef369-6bac-40ca-a 107G Unknown N/A N/A No
ncn-s001 /dev/vdd hdd 3cda8ba2-ccaf-4515-b 107G Unknown N/A N/A No
ncn-s002 /dev/vdb hdd 775639a6-092e-4f3a-9 107G Unknown N/A N/A No
ncn-s002 /dev/vdc hdd 261e8a40-2349-484e-8 107G Unknown N/A N/A No
ncn-s002 /dev/vdd hdd 8f01f9c6-2c6c-449c-a 107G Unknown N/A N/A No
ncn-s003 /dev/vdb hdd 46467f02-1d11-44b2-b 107G Unknown N/A N/A No
ncn-s003 /dev/vdc hdd 4797e919-667e-4376-b 107G Unknown N/A N/A No
ncn-s003 /dev/vdd hdd 3b2c090d-37a0-403b-a 107G Unknown N/A N/A No
IMPORTANT:
If you use the --wide
, it will give the reasons a drive is not Available.
. This DOES NOT mean something is wrong. If Ceph already has the drive provisioned, then you may see similar reasons
Update the size or placement for a service or apply a large yaml spec
ncn-s# ceph orch apply [mon|mgr|rbd-mirror|crash|alertmanager|grafana|node-exporter|prometheus] [<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Scale an iSCSI service
ncn-s# ceph orch apply iscsi <pool> <api_user> <api_password> [<trusted_ip_list>][<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Update the number of MDS instances for the given fs_name
ncn-s# ceph orch apply mds <fs_name> [<placement>] [--dry-run] [--unmanaged] [plain|json|json-pretty|yaml]
Scale an NFS service
ncn-s# ceph orch apply nfs <svc_id> <pool> [<namespace>] [<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Create OSD daemon(s) using a drive group spec
ncn-s# ceph orch apply osd [--all-available-devices] [--dry-run] [--unmanaged] [plain|json|json-pretty|yaml]
Update the number of RGW instances for the given zone
ncn-s# ceph orch apply rgw <realm_name> <zone_name> [<subcluster>] [<port:int>] [--ssl] [<placement>] [--dry-run] [plain|json|json-pretty|yaml] [--unmanaged]
Cancels ongoing operations
ncn-s# ceph orch cancel
Add daemon(s)
ncn-s# ceph orch daemon add [mon|mgr|rbd-mirror|crash|alertmanager|grafana|node-exporter|prometheus] [<placement>]
Start iscsi daemon(s)
ncn-s# ceph orch daemon add iscsi <pool> <api_user> <api_password> [<trusted_ip_list>] [<placement>]
Start MDS daemon(s)
ncn-s# ceph orch daemon add mds <fs_name> [<placement>]
Start NFS daemon(s)
ncn-s# ceph orch daemon add nfs <svc_id> <pool> [<namespace>] [<placement>]
Create an OSD service. Either –svc_arg=host:drives
ncn-s# ceph orch daemon add osd [<svc_arg>]
Start RGW daemon(s)
ncn-s# ceph orch daemon add rgw <realm_name> <zone_name> [<subcluster>] [<port:int>] [--ssl] [<placement>]
Redeploy a daemon (with a specific image)
ncn-s# ceph orch daemon redeploy <name> [<image>]
Remove specific daemon(s)
ncn-s# ceph orch daemon rm <names>... [--force]
Start, stop, restart, (redeploy,) or reconfig a specific daemon
ncn-s# ceph orch daemon start|stop|restart|reconfig <name>
List devices on a host
ncn-s# ceph orch device ls [<hostname>...] [plain|json|json-pretty|yaml] [--refresh] [--wide]
Zap (erase!) a device so it can be re-used
ncn-s# ceph orch device zap <hostname> <path> [--force]
Add a host
ncn-s# ceph orch host add <hostname> [<addr>] [<labels>...]
Add a host label
ncn-s# ceph orch host label add <hostname> <label>
Remove a host label
ncn-s# ceph orch host label rm <hostname> <label>
List hosts
ncn-s# ceph orch host ls [plain|json|json-pretty|yaml]
Check if the specified host can be safely stopped without reducing availability
ncn-s# ceph orch host ok-to-stop <hostname>
Remove a host
ncn-s# cephorch host rm <hostname>
Update a host address
ncn-s# cephorch host set-addr <hostname> <addr>
List services known to orchestrator
ncn-s# cephorch ls [<service_type>] [<service_name>] [--export] [plain|json|json-pretty|yaml] [--refresh]
Remove OSD services
ncn-s# cephorch osd rm <svc_id>... [--replace] [--force]
Status of OSD removal operation
ncn-s# cephorch osd rm status [plain|json|json-pretty|yaml]
Remove OSD services
ncn-s# cephorch osd rm stop <svc_id>...
Pause orchestrator background work
ncn-s# cephorch pause
List daemons known to orchestrator
ncn-s# cephorch ps [<hostname>] [<service_name>] [<daemon_type>] [<daemon_id>] [plain|json|json-pretty|yaml] [--refresh]
Resume orchestrator background work (if paused)
ncn-s# cephorch resume
Remove a service
ncn-s# cephorch rm <service_name> [--force]
Select orchestrator module backend
ncn-s# cephorch set backend <module_name>
Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons)
ncn-s# cephorch start|stop|restart|redeploy|reconfig <service_name>
Report configured backend and its status
ncn-s# cephorch status [plain|json|json-pretty|yaml]
Check service versions vs available and target containers
ncn-s# cephorch upgrade check [<image>] [<ceph_version>]
Pause an in-progress upgrade
ncn-s# cephorch upgrade pause
Resume paused upgrade
ncn-s# cephorch upgrade resume
Initiate upgrade
ncn-s# cephorch upgrade start [<image>] [<ceph_version>]
Check service versions vs available and target containers
ncn-s# cephorch upgrade status
Stop an in-progress upgrade
ncn-s# cephorch upgrade stop