Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") #304

smithfarm · 2017-05-31T17:34:44Z

When an OSD disk is re-used, there is a hopefully low but significant probability that the user will have insufficiently zapped the GPT structures, causing ceph-disk to get confused about what the cluster fsid is.

See https://bugzilla.novell.com/show_bug.cgi?id=1041987 for the gory details.

This could happen when a disk is migrated from one cluster to another, for example, or when a cluster/node is reprovisioned for any reason.

The proposed error message is: "This disk looks like it belongs to another cluster. Please use deepsea zap to clear it if you're really sure"

There could be a --really-sure option to forcibly do this as part of an install, but it is probably too risky.

The text was updated successfully, but these errors were encountered:

smithfarm · 2017-05-31T17:45:31Z

Here is the current output from stage 3 in this scenario: https://bugzilla.novell.com/attachment.cgi?id=727150

smithfarm · 2017-05-31T21:48:12Z

Note that sgdisk --zap-all is insufficient to guarantee that the old cluster's data won't pollute the new cluster's OSDs. The correct command to use is ceph-disk zap.

smithfarm · 2017-05-31T21:56:09Z

Also note that I'm not sure if "Error: No cluster conf found in /etc/ceph with fsid 4dccfeb4-dde1-3dad-85c9-ef7723878f63" is the only failure mode indicative of "ghosts of filesystems past" :-(

Still, if we grep for No cluster conf found in /etc/ceph with fsid and display a more helpful error message, that would at least be a start.

swiftgist · 2017-06-01T12:23:59Z

I think this falls into #259 for creating a utility to help zap disks on the initial installation. The one extra step that DeepSea does during the rescind (for both Stage 5 and ceph.purge) is wiping the beginning of a partition prior to zapping the disks. That seems to guarantee that a filesystem won't magically come back just because the partitions landed perfectly. I experienced this problem differently with journal symlinks that point to missing devices. Otherwise, rescind does the same steps as ceph-disk (i.e. remove the backup partitions, etc.)

Once a cluster is up and running, the normal Stage 5 process for removing a disk should be sufficient. This issue is in the larger cycle where the same hardware is used for testing deployment. I do not have an eta for such a utility though.

smithfarm · 2017-06-01T13:16:17Z

I agree it's related, but there is no guarantee that the user will have run DeepSea's rescind/purge functionality if and when it arrives, or indeed that they zapped the disk at all, so isn't there still a need for improved error reporting that justifies a separate fix?

smithfarm · 2017-06-01T13:19:17Z

Also, as Denis P. noted, it's a Catch-22 situation if I'm starting out with a virgin OS installation on hardware that contains unzapped (or insufficiently zapped) disks from a different (perhaps long-gone) cluster. I'm supposed to use "ceph-disk zap" on the disks, but the package containing the ceph-disk binary doesn't even get installed until stage 3. By that time, it's too late - the weird, misleading error message is already displayed.

swiftgist · 2017-06-01T13:39:08Z

@smithfarm Oh, there's a need for something... it's just a bit hard to define the requirements when we have the dependency issues such as running ceph-disk when it isn't there.

My paranoia really wants Stage 2 to be complete in that the admin has definitely said "these disks will be used". At that point though, is only a one-shot utility needed?

If this functionality gets added to Stage 3 permanently, I'm concerned about the impact of a 600 disk cluster getting checked for every fsid and osd id to make absolutely sure that this OSD is "right" and should not be destroyed. After the virgin setup, I am speculating that this is less of an issue.

Does this workflow make sense?

Run Stages 0-2 as normal
If you have been using this hardware for other Ceph installations, run cephdisks.zap (or similar)
Run the remaining Stages

It's outside the normal process, but I believe it's only necessary in this particular scenario. The other issue is making sure cephdisks.zap doesn't do anything to good OSDs when somebody decides to run it a few months after the cluster has been running.

smithfarm · 2017-06-01T13:45:13Z

Run Stages 0-2 as normal
If you have been using this hardware for other Ceph installations, run cephdisks.zap (or similar)
Run the remaining Stages

That works for me, assuming users know to do it. Can stage 2 detect the presence of disks from other Ceph installations and gently suggest the user what they need to do? Then, stage 3 could run the same check again, and if the nonzapped disks are still there, it could fail with a proper error message.

smithfarm · 2017-06-01T13:47:26Z

If this functionality gets added to Stage 3 permanently, I'm concerned about the impact of a 600 disk cluster getting checked for every fsid and osd id to make absolutely sure that this OSD is "right" and should not be destroyed. After the virgin setup, I am speculating that this is less of an issue.

Ah, now I hear you. Stage 2 could run the expensive check and remember (store) which disks were possibly unzapped. Then stage 3 would only need to recheck those particular disks.

smithfarm changed the title ~~Improve error message when stage 3 fails due to insufficiently zapped partition table~~ Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") May 31, 2017

smithfarm closed this as completed Jun 1, 2017

smithfarm reopened this Jun 1, 2017

swiftgist added the question label Mar 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") #304

Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") #304

smithfarm commented May 31, 2017 •

edited

Loading

smithfarm commented May 31, 2017

smithfarm commented May 31, 2017

smithfarm commented May 31, 2017 •

edited

Loading

swiftgist commented Jun 1, 2017

smithfarm commented Jun 1, 2017

smithfarm commented Jun 1, 2017 •

edited

Loading

swiftgist commented Jun 1, 2017

smithfarm commented Jun 1, 2017

smithfarm commented Jun 1, 2017

Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") #304

Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") #304

Comments

smithfarm commented May 31, 2017 • edited Loading

smithfarm commented May 31, 2017

smithfarm commented May 31, 2017

smithfarm commented May 31, 2017 • edited Loading

swiftgist commented Jun 1, 2017

smithfarm commented Jun 1, 2017

smithfarm commented Jun 1, 2017 • edited Loading

swiftgist commented Jun 1, 2017

smithfarm commented Jun 1, 2017

smithfarm commented Jun 1, 2017

smithfarm commented May 31, 2017 •

edited

Loading

smithfarm commented May 31, 2017 •

edited

Loading

smithfarm commented Jun 1, 2017 •

edited

Loading