-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown at specific epoch when applying new consensus rules #948
Comments
Peers can signal a desire to shutdown via a |
Shutdown and then get restarted automatically by systemd? I don't think the shutdown changes anything here. If anything the fedimintd should just refuse to generate any new epochs and wait (for operator, or some other things unblocking it). I guess we either invest into writing proper "migration" logic, where fedimintd can support both the previous and new consensus rules, and switch at a given (probably I wonder if we could even have an easy hybrid. We could have two sets of consensus ports. Before upgrade the guardian would run the fedimintd like this:
After reaching upgrade epoch, fedimintd-0.14 would restart itself. On start it would detect that it is already at upgrade epoch and start operating in a read-only mode on an alternative set of ports, it would also start Each fedimintd would know that if it is connecting to a peer that reports higher versions then their own, they should try alternative set of ports. This way the old version can stick around indefinitely and allow any peers that were down etc. to finish syncing to the upgrade epoch when possible. While at no point we need a binary that can support both old and new protocol. After guardians confirm that all peers are up to date, they can turn off the previous and run just
. It gives very similar user (guardian) experience, while as fedimintd devs we only ever need to test all peers running the same version at the time. For the end users this will probably not work, but if we have alternative set of ports for user rpcs as well, then the users detecting unsupported peer version, can still try to connect to the old version and at least get some read-only responses and information about the upgrade. Thought I still think clients will have to support at least a window of two versions at the same time to allow smooth upgrades. |
Yeah, I was imagining something like that would be good to minimize downtime, since the guardians might not be able to manually coordinate the upgrade within a short time window. |
I don't think we should focus on uncoordinated upgrades too much yet. Being able to upgrade at all would be a huge improvement and is far lower complexity. Adding automatic switchover or even running two federations in parallel can build on top of that and would require far more work right now imo. |
@elsirion What do you think of this for the MVP upgrade:
For version upgrades, we could eventually automate the steps with an upgrade script. |
@jkitman First modifying config and then stopping consensus at a certain epoch doesn't work imo: you need the old cfg to keep running till that epoch. We don't need to automate this for now, we can just have an admin command that tells the fedimintd to shut down after epoch x:
There will be downtime, but there's so much less that can go wrong. Specifying and testing a water tight version of your on-the-fly upgrade approach will be much more involved imo and could introduce quite some instability before we even have an MVP. |
@elsirion I meant modify a copy of the consensus config, rather than modify the existing config in-place. A script could replace the existing config with the modified one once enough guardians agree. |
@elsirion Was working on this and realized we don't really have an authenticated way for a guardian to send messages to their |
Ah, so we did post something on GH! I just didn't search in PRs … I opened #1841 to track it. |
dev call: @joschisan says we might close this issue as "not planned". this would mean we can't make consensus changes in backwards-compatible way without changing minor consensus version (#3571) |
When making an upgrade, we need all peers to shutdown at the same epoch. Otherwise, different peers might end up in different states.
Then everyone switches out fedimintd and applies the new rules at the same time.
The text was updated successfully, but these errors were encountered: