Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") #304

Open
smithfarm opened this issue May 31, 2017 · 9 comments
Labels

Comments

@smithfarm
Copy link
Contributor

smithfarm commented May 31, 2017

When an OSD disk is re-used, there is a hopefully low but significant probability that the user will have insufficiently zapped the GPT structures, causing ceph-disk to get confused about what the cluster fsid is.

See https://bugzilla.novell.com/show_bug.cgi?id=1041987 for the gory details.

This could happen when a disk is migrated from one cluster to another, for example, or when a cluster/node is reprovisioned for any reason.

The proposed error message is: "This disk looks like it belongs to another cluster. Please use deepsea zap to clear it if you're really sure"

There could be a --really-sure option to forcibly do this as part of an install, but it is probably too risky.

@smithfarm
Copy link
Contributor Author

Here is the current output from stage 3 in this scenario: https://bugzilla.novell.com/attachment.cgi?id=727150

@smithfarm
Copy link
Contributor Author

Note that sgdisk --zap-all is insufficient to guarantee that the old cluster's data won't pollute the new cluster's OSDs. The correct command to use is ceph-disk zap.

@smithfarm smithfarm changed the title Improve error message when stage 3 fails due to insufficiently zapped partition table Improve error message when stage 3 fails due to insufficiently zapped partition table ("ghosts of filesystems past") May 31, 2017
@smithfarm
Copy link
Contributor Author

smithfarm commented May 31, 2017

Also note that I'm not sure if "Error: No cluster conf found in /etc/ceph with fsid 4dccfeb4-dde1-3dad-85c9-ef7723878f63" is the only failure mode indicative of "ghosts of filesystems past" :-(

Still, if we grep for No cluster conf found in /etc/ceph with fsid and display a more helpful error message, that would at least be a start.

@swiftgist
Copy link
Contributor

I think this falls into #259 for creating a utility to help zap disks on the initial installation. The one extra step that DeepSea does during the rescind (for both Stage 5 and ceph.purge) is wiping the beginning of a partition prior to zapping the disks. That seems to guarantee that a filesystem won't magically come back just because the partitions landed perfectly. I experienced this problem differently with journal symlinks that point to missing devices. Otherwise, rescind does the same steps as ceph-disk (i.e. remove the backup partitions, etc.)

Once a cluster is up and running, the normal Stage 5 process for removing a disk should be sufficient. This issue is in the larger cycle where the same hardware is used for testing deployment. I do not have an eta for such a utility though.

@smithfarm
Copy link
Contributor Author

I agree it's related, but there is no guarantee that the user will have run DeepSea's rescind/purge functionality if and when it arrives, or indeed that they zapped the disk at all, so isn't there still a need for improved error reporting that justifies a separate fix?

@smithfarm
Copy link
Contributor Author

smithfarm commented Jun 1, 2017

Also, as Denis P. noted, it's a Catch-22 situation if I'm starting out with a virgin OS installation on hardware that contains unzapped (or insufficiently zapped) disks from a different (perhaps long-gone) cluster. I'm supposed to use "ceph-disk zap" on the disks, but the package containing the ceph-disk binary doesn't even get installed until stage 3. By that time, it's too late - the weird, misleading error message is already displayed.

@swiftgist
Copy link
Contributor

@smithfarm Oh, there's a need for something... it's just a bit hard to define the requirements when we have the dependency issues such as running ceph-disk when it isn't there.

My paranoia really wants Stage 2 to be complete in that the admin has definitely said "these disks will be used". At that point though, is only a one-shot utility needed?

If this functionality gets added to Stage 3 permanently, I'm concerned about the impact of a 600 disk cluster getting checked for every fsid and osd id to make absolutely sure that this OSD is "right" and should not be destroyed. After the virgin setup, I am speculating that this is less of an issue.

Does this workflow make sense?

Run Stages 0-2 as normal
If you have been using this hardware for other Ceph installations, run cephdisks.zap (or similar)
Run the remaining Stages

It's outside the normal process, but I believe it's only necessary in this particular scenario. The other issue is making sure cephdisks.zap doesn't do anything to good OSDs when somebody decides to run it a few months after the cluster has been running.

@smithfarm
Copy link
Contributor Author

Run Stages 0-2 as normal
If you have been using this hardware for other Ceph installations, run cephdisks.zap (or similar)
Run the remaining Stages

That works for me, assuming users know to do it. Can stage 2 detect the presence of disks from other Ceph installations and gently suggest the user what they need to do? Then, stage 3 could run the same check again, and if the nonzapped disks are still there, it could fail with a proper error message.

@smithfarm
Copy link
Contributor Author

If this functionality gets added to Stage 3 permanently, I'm concerned about the impact of a 600 disk cluster getting checked for every fsid and osd id to make absolutely sure that this OSD is "right" and should not be destroyed. After the virgin setup, I am speculating that this is less of an issue.

Ah, now I hear you. Stage 2 could run the expensive check and remember (store) which disks were possibly unzapped. Then stage 3 would only need to recheck those particular disks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants