Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD prepare fails for plain partitions #14503

Closed
ThorbenJ opened this issue Jul 26, 2024 · 5 comments
Closed

OSD prepare fails for plain partitions #14503

ThorbenJ opened this issue Jul 26, 2024 · 5 comments

Comments

@ThorbenJ
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

When using a plain partition (e.g. /dev/nvme0n1p2) OSD prepare will fail at the "lvm batch" command. (see logs below)
The error being that either a PV or Raw block device must be given.

Alternatively: If one makes the partition an empty/unused PV with pvcreate, the the OSD prepare script will incorrectly skip/ingore the patition beleiving it to already be in use (see logs below).

Expected behavior:

Be able to configure an OSD to use a plain partition

How to reproduce it (minimal and precise):
I will document my setup, which is not the minimal setup. In my case in addition to the NVME I have a USB SSD; to have tiered storage one fast and one slow but higher capacity for long term stuff.

The NVME has two partitions, the first is a PV for another Volgroup unrelated to Rook/Ceph. The second is the one assigned to Rook/Ceph. The USB SDD has one parition, all assigned to Rook/Ceph

Relevant config snippet from Helm values file for the cluster:

...
  storage: # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: false
    
    config:
      osdsPerDevice: "0"  # This indicates the number of OSDs per device
    devices:
      - name: /dev/disk/by-partlabel/dpData-NVME
        config:
          deviceClass: fast
          osdsPerDevice: "3"
      - name: /dev/disk/by-partlabel/dpData-USB
        config:
          deviceClass: slow
          osdsPerDevice: "1"
...

For reference:

ls -l /dev/disk/by-partlabel/
total 0
lrwxrwxrwx 1 root root 15 Jul 21 08:09 bootfs -> ../../mmcblk0p1
lrwxrwxrwx 1 root root 15 Jul 21 08:09 dpData-NVME -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 10 Jul 21 08:09 dpData-USB -> ../../sda1
lrwxrwxrwx 1 root root 15 Jul 21 08:09 dpLVM-NVME -> ../../nvme0n1p1

Logs to submit:

When "nvme0n1p2" is empty (using wipefs -a)

2024-07-21 08:03:29.031853 I | cephosd: device "nvme0n1p2" is available.
2024-07-21 08:03:29.031934 I | cephosd: "/dev/disk/by-partlabel/dpData-NVME" found in the desired devices (matched by link: "/dev/disk/by-partlabel/dpData-NVME")
2024-07-21 08:03:29.031962 I | cephosd: device "nvme0n1p2" is selected by the device filter/name "/dev/disk/by-partlabel/dpData-NVME"
2024-07-21 08:03:29.046587 I | cephosd: configuring osd devices: {"Entries":{"nvme0n1p2":{"Data":-1,"Metadata":null,"Config":{"Name":"/dev/disk/by-partlabel/dpData-NVME","OSDsPerDevice":3,"MetadataDevice":"","DatabaseSizeMB":0,"DeviceClass":"fast","InitialWeight":"","IsFilter":false,"IsDevicePathFilter":false},"PersistentDevicePaths":["/dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166_1-part2","/dev/disk/by-partuuid/788e1a9b-9b29-9e4c-a64d-4e29af4d4f2c","/dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166-part2","/dev/disk/by-partlabel/dpData-NVME","/dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4ca31ff7-part2","/dev/block/by-name/dpData-NVME","/dev/disk/by-path/platform-a40000000.pcie-pci-0000:01:00.0-nvme-1-part2"],"DeviceInfo":{"name":"nvme0n1p2","parent":"nvme0n1","hasChildren":false,"devLinks":"/dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166_1-part2 /dev/disk/by-partuuid/788e1a9b-9b29-9e4c-a64d-4e29af4d4f2c /dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166-part2 /dev/disk/by-partlabel/dpData-NVME /dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4ca31ff7-part2 /dev/block/by-name/dpData-NVME /dev/disk/by-path/platform-a40000000.pcie-pci-0000:01:00.0-nvme-1-part2","size":1000197849088,"uuid":"","serial":"SanDisk_SSD_Plus_2TB_A3N_23454X802166_1","type":"part","rotational":false,"readOnly":false,"Partitions":null,"filesystem":"","mountpoint":"","vendor":"","model":"SanDisk SSD Plus 2TB A3N","wwn":"eui.e8238fa6bf530001001b448b4ca31ff7","wwnVendorExtension":"","empty":false,"real-path":"/dev/nvme0n1p2","kernel-name":"nvme0n1p2"},"RestoreOSD":false}}}
2024-07-21 08:03:29.046665 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd"
2024-07-21 08:03:29.046688 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rookio-system --conf=/var/lib/rook/rookio-system/rookio-system.config --name=client.admin --keyring=/var/lib/rook/rookio-system/client.admin.keyring --format json
2024-07-21 08:03:29.764737 D | cephosd: won't use raw mode for disk "/dev/disk/by-partlabel/dpData-NVME" since osd per device is 3
2024-07-21 08:03:29.764919 I | cephosd: configuring new LVM device nvme0n1p2
2024-07-21 08:03:29.764933 I | cephosd: Base command - stdbuf
2024-07-21 08:03:29.764953 I | cephosd: immediateExecuteArgs - [-oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 3 /dev/nvme0n1p2 --crush-device-class fast]
2024-07-21 08:03:29.764966 I | cephosd: immediateReportArgs - [-oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 3 /dev/nvme0n1p2 --crush-device-class fast --report]
2024-07-21 08:03:29.764977 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 3 /dev/nvme0n1p2 --crush-device-class fast --report
2024-07-21 08:03:30.475656 D | exec: usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
2024-07-21 08:03:30.475704 D | exec:                              [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
2024-07-21 08:03:30.475716 D | exec:                              [--auto] [--no-auto] [--bluestore] [--report]
2024-07-21 08:03:30.475726 D | exec:                              [--yes] [--format {json,json-pretty,pretty}]
2024-07-21 08:03:30.475736 D | exec:                              [--dmcrypt]
2024-07-21 08:03:30.475746 D | exec:                              [--crush-device-class CRUSH_DEVICE_CLASS]
2024-07-21 08:03:30.475755 D | exec:                              [--no-systemd]
2024-07-21 08:03:30.475765 D | exec:                              [--osds-per-device OSDS_PER_DEVICE]
2024-07-21 08:03:30.475775 D | exec:                              [--data-slots DATA_SLOTS]
2024-07-21 08:03:30.475784 D | exec:                              [--data-allocate-fraction DATA_ALLOCATE_FRACTION]
2024-07-21 08:03:30.475794 D | exec:                              [--block-db-size BLOCK_DB_SIZE]
2024-07-21 08:03:30.475803 D | exec:                              [--block-db-slots BLOCK_DB_SLOTS]
2024-07-21 08:03:30.475814 D | exec:                              [--block-wal-size BLOCK_WAL_SIZE]
2024-07-21 08:03:30.475823 D | exec:                              [--block-wal-slots BLOCK_WAL_SLOTS] [--prepare]
2024-07-21 08:03:30.475833 D | exec:                              [--osd-ids [OSD_IDS [OSD_IDS ...]]]
2024-07-21 08:03:30.475858 D | exec:                              [DEVICES [DEVICES ...]]
2024-07-21 08:03:30.475870 D | exec: ceph-volume lvm batch: error: /dev/nvme0n1p2 is a partition, please pass LVs or raw block devices
2024-07-21 08:03:30.545172 C | rookcmd: failed to configure devices: failed to initialize osd: failed ceph-volume report: exit status 2

Fix for this LVM error would be to run pvcreate before the lvm command.

However if I use pvcreate in advance of installing the cluster CR, I get:

2024-07-21 07:45:15.014647 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p1
2024-07-21 07:45:15.033444 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166-part1 /dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4ca31ff7-part1 /dev/disk/by-path/platform-a40000000.pcie-pci-0000:01:00.0-nvme-1-part1 /dev/disk/by-partlabel/dpLVM-NVME /dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166_1-part1 /dev/disk/by-id/lvm-pv-uuid-C4Ndek-3qPc-mSQA-7XAM-lpWZ-xOvP-xGnG6G /dev/disk/by-partuuid/9181da3a-b330-5b47-b330-83951917d950 /dev/block/by-name/dpLVM-NVME\nDEVNAME=/dev/nvme0n1p1\nDEVPATH=/devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p1\nDEVTYPE=partition\nDISKSEQ=30\nID_FS_TYPE=LVM2_member\nID_FS_USAGE=raid\nID_FS_UUID=C4Ndek-3qPc-mSQA-7XAM-lpWZ-xOvP-xGnG6G\nID_FS_UUID_ENC=C4Ndek-3qPc-mSQA-7XAM-lpWZ-xOvP-xGnG6G\nID_FS_VERSION=LVM2 001\nID_MODEL=SanDisk SSD Plus 2TB A3N\nID_NSID=1\nID_PART_ENTRY_DISK=259:0\nID_PART_ENTRY_NAME=dpLVM-NVME\nID_PART_ENTRY_NUMBER=1\nID_PART_ENTRY_OFFSET=2048\nID_PART_ENTRY_SCHEME=gpt\nID_PART_ENTRY_SIZE=1953514584\nID_PART_ENTRY_TYPE=e6d6d379-f507-44c2-a23c-238f2a3df928\nID_PART_ENTRY_UUID=9181da3a-b330-5b47-b330-83951917d950\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=59896760-bc38-b042-b962-aaec1415aeed\nID_PATH=platform-a40000000.pcie-pci-0000:01:00.0-nvme-1\nID_PATH_TAG=platform-a40000000_pcie-pci-0000_01_00_0-nvme-1\nID_REVISION=33006000\nID_SERIAL=SanDisk_SSD_Plus_2TB_A3N_23454X802166_1\nID_SERIAL_SHORT=23454X802166\nID_WWN=eui.e8238fa6bf530001001b448b4ca31ff7\nMAJOR=259\nMINOR=3\nPARTN=1\nPARTNAME=dpLVM-NVME\nSUBSYSTEM=block\nSYSTEMD_READY=1\nTAGS=:systemd:\nUSEC_INITIALIZED=35659183584"
2024-07-21 07:45:15.033535 D | exec: Running command: lsblk /dev/nvme0n1p1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2024-07-21 07:45:15.043525 D | sys: lsblk output: "SIZE=\"1000199467008\" ROTA=\"0\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/nvme0n1\" NAME=\"/dev/nvme0n1p1\" KNAME=\"/dev/nvme0n1p1\" MOUNTPOINT=\"\" FSTYPE=\"LVM2_member\""
2024-07-21 07:45:15.043584 D | exec: Running command: ceph-volume inventory --format json /dev/nvme0n1p1
2024-07-21 07:45:15.900421 I | cephosd: skipping device "nvme0n1p1": ["Has a FileSystem", "LVM detected"].
2024-07-21 07:45:15.900483 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2024-07-21 07:45:15.901106 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p2
2024-07-21 07:45:15.919148 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-partuuid/788e1a9b-9b29-9e4c-a64d-4e29af4d4f2c /dev/block/by-name/dpData-NVME /dev/disk/by-partlabel/dpData-NVME /dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166-part2 /dev/disk/by-id/nvme-SanDisk_SSD_Plus_2TB_A3N_23454X802166_1-part2 /dev/disk/by-path/platform-a40000000.pcie-pci-0000:01:00.0-nvme-1-part2 /dev/disk/by-id/nvme-eui.e8238fa6bf530001001b448b4ca31ff7-part2 /dev/disk/by-id/lvm-pv-uuid-mN2Ij2-361S-fQNd-PtIO-X1SI-MBOV-HjHj8i\nDEVNAME=/dev/nvme0n1p2\nDEVPATH=/devices/platform/a40000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/nvme/nvme0/nvme0n1/nvme0n1p2\nDEVTYPE=partition\nDISKSEQ=30\nID_FS_TYPE=LVM2_member\nID_FS_USAGE=raid\nID_FS_UUID=mN2Ij2-361S-fQNd-PtIO-X1SI-MBOV-HjHj8i\nID_FS_UUID_ENC=mN2Ij2-361S-fQNd-PtIO-X1SI-MBOV-HjHj8i\nID_FS_VERSION=LVM2 001\nID_MODEL=SanDisk SSD Plus 2TB A3N\nID_NSID=1\nID_PART_ENTRY_DISK=259:0\nID_PART_ENTRY_NAME=dpData-NVME\nID_PART_ENTRY_NUMBER=2\nID_PART_ENTRY_OFFSET=1953517568\nID_PART_ENTRY_SCHEME=gpt\nID_PART_ENTRY_SIZE=1953511424\nID_PART_ENTRY_TYPE=8da63339-0007-60c0-c436-083ac8230908\nID_PART_ENTRY_UUID=788e1a9b-9b29-9e4c-a64d-4e29af4d4f2c\nID_PART_TABLE_TYPE=gpt\nID_PART_TABLE_UUID=59896760-bc38-b042-b962-aaec1415aeed\nID_PATH=platform-a40000000.pcie-pci-0000:01:00.0-nvme-1\nID_PATH_TAG=platform-a40000000_pcie-pci-0000_01_00_0-nvme-1\nID_REVISION=33006000\nID_SERIAL=SanDisk_SSD_Plus_2TB_A3N_23454X802166_1\nID_SERIAL_SHORT=23454X802166\nID_WWN=eui.e8238fa6bf530001001b448b4ca31ff7\nMAJOR=259\nMINOR=4\nPARTN=2\nPARTNAME=dpData-NVME\nSUBSYSTEM=block\nSYSTEMD_READY=1\nTAGS=:systemd:\nUSEC_INITIALIZED=35659193066"
2024-07-21 07:45:15.919228 D | exec: Running command: lsblk /dev/nvme0n1p2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2024-07-21 07:45:15.928619 D | sys: lsblk output: "SIZE=\"1000197849088\" ROTA=\"0\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/nvme0n1\" NAME=\"/dev/nvme0n1p2\" KNAME=\"/dev/nvme0n1p2\" MOUNTPOINT=\"\" FSTYPE=\"LVM2_member\""
2024-07-21 07:45:15.928682 D | exec: Running command: ceph-volume inventory --format json /dev/nvme0n1p2
2024-07-21 07:45:16.774149 I | cephosd: skipping device "nvme0n1p2": ["Has a FileSystem"].
2024-07-21 07:45:16.788348 I | cephosd: configuring osd devices: {"Entries":{}}
2024-07-21 07:45:16.788397 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2024-07-21 07:45:16.789093 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2024-07-21 07:45:17.397558 D | cephosd: {}
2024-07-21 07:45:17.397642 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2024-07-21 07:45:17.397751 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json
2024-07-21 07:45:18.580262 D | cephosd: {}
2024-07-21 07:45:18.580345 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2024-07-21 07:45:18.580371 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "worker01"

i.e. The PV "nvme0n1p2" is skipped/ignored

Cluster Status to submit:

The prepare jobs all fail

kc -n rookio-system get all 
NAME                                                     READY   STATUS             RESTARTS         AGE
pod/csi-cephfsplugin-fkhz5                               2/2     Running            1 (5d14h ago)    5d14h
pod/csi-cephfsplugin-h5k85                               2/2     Running            1 (5d14h ago)    5d14h
pod/csi-cephfsplugin-jlkrq                               2/2     Running            1 (5d14h ago)    5d14h
pod/csi-cephfsplugin-kfhsf                               2/2     Running            1 (5d14h ago)    5d14h
pod/csi-cephfsplugin-mrw8z                               2/2     Running            0                5d14h
pod/csi-cephfsplugin-provisioner-68d5f89cbf-cpjsm        5/5     Running            0                5d14h
pod/csi-cephfsplugin-provisioner-68d5f89cbf-pqf7h        5/5     Running            0                5d14h
pod/csi-cephfsplugin-rgxvf                               2/2     Running            0                5d14h
pod/csi-rbdplugin-5jszs                                  2/2     Running            1 (5d14h ago)    5d14h
pod/csi-rbdplugin-cm47x                                  2/2     Running            0                5d14h
pod/csi-rbdplugin-cm5ts                                  2/2     Running            1 (5d14h ago)    5d14h
pod/csi-rbdplugin-hzd8t                                  2/2     Running            0                5d14h
pod/csi-rbdplugin-nm2pn                                  2/2     Running            1 (5d14h ago)    5d14h
pod/csi-rbdplugin-provisioner-586c7f47f6-gv8w2           5/5     Running            0                5d14h
pod/csi-rbdplugin-provisioner-586c7f47f6-zc962           5/5     Running            0                5d14h
pod/csi-rbdplugin-zmmrv                                  2/2     Running            1 (5d14h ago)    5d14h
pod/rook-ceph-crashcollector-worker01-778b7dc86d-kms98   1/1     Running            0                5d14h
pod/rook-ceph-crashcollector-worker02-6c8fbdf455-jkrfh   1/1     Running            0                5d14h
pod/rook-ceph-crashcollector-worker05-6f58dc477b-kqcfg   1/1     Running            0                5d14h
pod/rook-ceph-crashcollector-worker06-7b4c899bcf-gqkwg   1/1     Running            0                5d14h
pod/rook-ceph-exporter-worker01-68f67f84f6-cng4w         1/1     Running            0                5d14h
pod/rook-ceph-exporter-worker02-5f89c44787-496jc         0/1     CrashLoopBackOff   1184 (79s ago)   5d14h
pod/rook-ceph-exporter-worker05-6559746788-76npb         1/1     Running            0                5d14h
pod/rook-ceph-exporter-worker06-6656d6599-jk2l5          1/1     Running            0                5d14h
pod/rook-ceph-mds-ceph-filesystem-a-7cd7844c84-m8v55     2/2     Running            0                5d14h
pod/rook-ceph-mds-ceph-filesystem-b-d8f47566-q7twg       2/2     Running            0                5d14h
pod/rook-ceph-mgr-a-775ff44866-v8bhp                     4/4     Running            0                5d14h
pod/rook-ceph-mgr-b-6c854bd45f-295pc                     4/4     Running            0                5d14h
pod/rook-ceph-mon-a-df669b757-6qwwh                      2/2     Running            0                5d14h
pod/rook-ceph-mon-b-5b5dccb99c-ktj6p                     2/2     Running            0                5d14h
pod/rook-ceph-mon-c-74644676d6-slnwg                     2/2     Running            0                5d14h
pod/rook-ceph-operator-6b87d7bf79-f66pz                  1/1     Running            0                5d5h
pod/rook-ceph-tools-d95d5ff7d-tf4tl                      1/1     Running            0                5d13h

NAME                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/rook-ceph-exporter               ClusterIP   240.42.191.102   <none>        9926/TCP            5d14h
service/rook-ceph-mgr                    ClusterIP   240.42.155.248   <none>        9283/TCP            5d14h
service/rook-ceph-mgr-dashboard          ClusterIP   240.42.239.129   <none>        8443/TCP            5d14h
service/rook-ceph-mon-a                  ClusterIP   240.42.247.106   <none>        6789/TCP,3300/TCP   5d14h
service/rook-ceph-mon-b                  ClusterIP   240.42.13.224    <none>        6789/TCP,3300/TCP   5d14h
service/rook-ceph-mon-c                  ClusterIP   240.42.113.120   <none>        6789/TCP,3300/TCP   5d14h
service/rook-ceph-rgw-ceph-objectstore   ClusterIP   240.42.88.45     <none>        80/TCP              5d14h

NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-cephfsplugin   6         6         6       6            6           <none>          5d14h
daemonset.apps/csi-rbdplugin      6         6         6       6            6           <none>          5d14h

NAME                                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/csi-cephfsplugin-provisioner        2/2     2            2           5d14h
deployment.apps/csi-rbdplugin-provisioner           2/2     2            2           5d14h
deployment.apps/rook-ceph-crashcollector-worker01   1/1     1            1           5d14h
deployment.apps/rook-ceph-crashcollector-worker02   1/1     1            1           5d14h
deployment.apps/rook-ceph-crashcollector-worker05   1/1     1            1           5d14h
deployment.apps/rook-ceph-crashcollector-worker06   1/1     1            1           5d14h
deployment.apps/rook-ceph-exporter-worker01         1/1     1            1           5d14h
deployment.apps/rook-ceph-exporter-worker02         0/1     1            0           5d14h
deployment.apps/rook-ceph-exporter-worker05         1/1     1            1           5d14h
deployment.apps/rook-ceph-exporter-worker06         1/1     1            1           5d14h
deployment.apps/rook-ceph-mds-ceph-filesystem-a     1/1     1            1           5d14h
deployment.apps/rook-ceph-mds-ceph-filesystem-b     1/1     1            1           5d14h
deployment.apps/rook-ceph-mgr-a                     1/1     1            1           5d14h
deployment.apps/rook-ceph-mgr-b                     1/1     1            1           5d14h
deployment.apps/rook-ceph-mon-a                     1/1     1            1           5d14h
deployment.apps/rook-ceph-mon-b                     1/1     1            1           5d14h
deployment.apps/rook-ceph-mon-c                     1/1     1            1           5d14h
deployment.apps/rook-ceph-operator                  1/1     1            1           5d15h
deployment.apps/rook-ceph-tools                     1/1     1            1           5d13h

NAME                                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/csi-cephfsplugin-provisioner-68d5f89cbf        2         2         2       5d14h
replicaset.apps/csi-rbdplugin-provisioner-586c7f47f6           2         2         2       5d14h
replicaset.apps/rook-ceph-crashcollector-worker01-58f7887574   0         0         0       5d14h
replicaset.apps/rook-ceph-crashcollector-worker01-778b7dc86d   1         1         1       5d14h
replicaset.apps/rook-ceph-crashcollector-worker02-6c8fbdf455   1         1         1       5d14h
replicaset.apps/rook-ceph-crashcollector-worker05-6f58dc477b   1         1         1       5d14h
replicaset.apps/rook-ceph-crashcollector-worker05-87b464cbb    0         0         0       5d14h
replicaset.apps/rook-ceph-crashcollector-worker06-75cb5ccb97   0         0         0       5d14h
replicaset.apps/rook-ceph-crashcollector-worker06-7b4c899bcf   1         1         1       5d14h
replicaset.apps/rook-ceph-exporter-worker01-66d798dd46         0         0         0       5d14h
replicaset.apps/rook-ceph-exporter-worker01-68f67f84f6         1         1         1       5d14h
replicaset.apps/rook-ceph-exporter-worker02-5f89c44787         1         1         0       5d14h
replicaset.apps/rook-ceph-exporter-worker05-6559746788         1         1         1       5d14h
replicaset.apps/rook-ceph-exporter-worker05-75b7c57858         0         0         0       5d14h
replicaset.apps/rook-ceph-exporter-worker06-6656d6599          1         1         1       5d14h
replicaset.apps/rook-ceph-exporter-worker06-6ccc7d4ddf         0         0         0       5d14h
replicaset.apps/rook-ceph-mds-ceph-filesystem-a-7cd7844c84     1         1         1       5d14h
replicaset.apps/rook-ceph-mds-ceph-filesystem-b-d8f47566       1         1         1       5d14h
replicaset.apps/rook-ceph-mgr-a-775ff44866                     1         1         1       5d14h
replicaset.apps/rook-ceph-mgr-b-6c854bd45f                     1         1         1       5d14h
replicaset.apps/rook-ceph-mon-a-df669b757                      1         1         1       5d14h
replicaset.apps/rook-ceph-mon-b-5b5dccb99c                     1         1         1       5d14h
replicaset.apps/rook-ceph-mon-c-74644676d6                     1         1         1       5d14h
replicaset.apps/rook-ceph-operator-6b87d7bf79                  1         1         1       5d15h
replicaset.apps/rook-ceph-tools-d95d5ff7d                      1         1         1       5d13h

NAME                                       STATUS   COMPLETIONS   DURATION   AGE
job.batch/rook-ceph-osd-prepare-worker01   Failed   0/1           5d5h       5d5h
job.batch/rook-ceph-osd-prepare-worker02   Failed   0/1           5d5h       5d5h
job.batch/rook-ceph-osd-prepare-worker03   Failed   0/1           5d5h       5d5h
job.batch/rook-ceph-osd-prepare-worker04   Failed   0/1           5d5h       5d5h
job.batch/rook-ceph-osd-prepare-worker05   Failed   0/1           5d5h       5d5h
job.batch/rook-ceph-osd-prepare-worker06   Failed   0/1           5d5h       5d5h

Environment:

  • HW: Orange PI 5 Plus, with one NVME disk and one USB attached
    • 6x Workers with 32GB RAM
    • 3x Mgmt with 16GB RAM
  • OS: Armbian (Debian) 12
  • K8s: K3S v1.30.2 k3s1
  • Kernel: Linux worker01 6.10.0-rc7-edge-rockchip-rk3588 Monitor bootstrapping with libcephd #1 SMP PREEMPT Sun Jul 7 21:23:46 UTC 2024 aarch64 GNU/Linux
  • Rook version: rook: v1.14.8 go: go1.21.11
  • Cepth version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)

** Further info **

I did try to seek help with this in Slack about a week ago: https://rook-io.slack.com/archives/CK9CF5H2R/p1721549460098239
In both #ceph and #general, but got no response, so decided to file this bug report.

@ThorbenJ ThorbenJ added the bug label Jul 26, 2024
@travisn
Copy link
Member

travisn commented Jul 26, 2024

When setting osdsPerDevice: "3", ceph-volume will be called in batch mode. Without that setting, ceph-volume will be called in raw mode. You should be able to create one OSD per nvme with that change. Running multiple OSDs per nvme is not needed like it was in the past anyway, we just still need to update our docs around that.

@ThorbenJ
Copy link
Author

Thanks for the tip. Yes setting it to 1 allowed the cluster to be created and its now healthy.

@BlaineEXE
Copy link
Member

Alternatively: If one makes the partition an empty/unused PV with pvcreate, the the OSD prepare script will incorrectly skip/ingore the patition beleiving it to already be in use (see logs below).

For other readers: this is intended behavior to ensure Rook does not overwrite user data.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Copy link

github-actions bot commented Oct 7, 2024

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants