-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(consensus): consensus version votes #5568
Conversation
@@ -425,9 404,18 @@ impl Fedimintd { | |||
std::process::exit(-1); | |||
} | |||
|
|||
#[allow(unused, clippy::unused_self)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used only for fedimintd dev ...
commmands and needs an extra fix, so left it broken for now.
@@ -508,6 499,27 @@ async fn run( | |||
Default::default(), | |||
); | |||
|
|||
// TODO: fixme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@joschisan So this would be the damage. :D |
1c406a1
to
a4c9f07
Compare
from call: can we combine this with updating/voting on new consensus config? |
I thought about it, and in my opinion these two can and should be done as a separate changes, where consensus version votes actually make rolling out config changes feature better (just like any other consensus-related changes), by making it possible to automatically agree on a consensus logic changes in a backward-compat way. |
a4c9f07
to
58ffd59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What isn't clear to me right now: how are modules told about which consensus version to use? It looks like the ConsensusVersionCache
isn't being read yet? I see two options:
- Give the consensus version to the
ServerModuleInit::init
function. This would likely require a restart on version change. - Supply the current version on every call to
process_{input,output,consensus_item}
// Minor module consensus version is irrelevant to the client, and | ||
// can change at runtime but since it's already here, we just | ||
// hardcode it to 0. In theory we could read it at runtime for | ||
// information purposes. | ||
minor: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because any new behavior would also result in a new API version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I being to think that we should just not expose consensus version to client altogether. If consensus version change changes the API, it can be incremented in lockstep in the implementation.
#[derive(Copy, Clone, Debug, Encodable, Decodable, PartialEq, Eq)] | ||
pub struct ConsensusVersionVoteValue { | ||
pub major: u32, | ||
pub minor: u32, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this type become the basis for CoreConsensusVersion
and ModuleConsensusVersion
?
/// In memory copy of all the current consensus versions that are maintained in | ||
/// the DB. | ||
/// | ||
/// To avoid reading from the database all the time, invalidation is supported. | ||
/// Any time consensus item that could have changed it is processed, the whole | ||
/// thing will get invalidated and reloaded at the beginning for processing new | ||
/// consensus item. | ||
/// | ||
/// Create by using [`ConsensusEngine::load_consensus_version_cache`] | ||
#[derive(Debug)] | ||
struct ConsensusVersionCache { | ||
// Currently nothing is using it, but in the future we can pass these | ||
// to modules. | ||
#[allow(unused)] | ||
versions: BTreeMap<Option<ModuleInstanceId>, ConsensusVersionValue>, | ||
invalidated: bool, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalidation logic sounds overly complicated. Why not make it a write-through cache that is used for all writes of consensus versions to the DB? One way or another we can mess it up, either by forgetting to invalidate the cache or by writing to the DB directly. But not doing direct writes sounds most natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that invalidatioon here is a simplification. There's only one way and place that can change a version - when the consensus version vote is processed. A write-through cache sounds roughly equivalent to me, except it would probably be a field on a struct, and not a variable.
fn get_desired_consensus_version_from_votes_static( | ||
num_peers: NumPeers, | ||
votes: impl IntoIterator<Item = ConsensusVersionVoteValue>, | ||
) -> Option<ConsensusVersionValue> { | ||
let mut votes_by_major: BTreeMap<u32, BTreeMap<u32, usize>> = Default::default(); | ||
for ConsensusVersionVoteValue { major, minor } in votes { | ||
*votes_by_major | ||
.entry(major) | ||
.or_default() | ||
.entry(minor) | ||
.or_default() = 1; | ||
} | ||
|
||
let threshold_needed = num_peers.threshold(); | ||
|
||
for (major, minor_counts) in votes_by_major { | ||
let mut threshold_accumulated = 0; | ||
for (minor, count) in minor_counts.into_iter().rev() { | ||
threshold_accumulated = count; | ||
|
||
if threshold_needed <= threshold_accumulated { | ||
return Some(ConsensusVersionValue { major, minor }); | ||
} | ||
} | ||
} | ||
|
||
None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version unification seems overly complex. Ideally everyone runs the same software version, if a federation suddenly runs 3 different versions (where this "least common denominator" logic would kick in) something is horribly wrong imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally everyone runs the same software version
I do not understand. This code needs to support switching between any versions, including major version switches if we ever have many, for the purpose of rewinding consensus from any past versions.
20 LoC is not much, IMO, and I moved it to a separate function so testing it thoroughly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... Or should we just use major version to mean v2
in lnv2
, and not support ever upgrading major consensus version in the running federations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that you should never have a situation where you run versions 1.0
, 1.1
, 1.2
and 1.1
together. If an upgrade happens that should work as follows:
At first everyone is at 1.0
, then everyone upgrades to e.g. 1.1
and once enough updated they vote on 1.1
. In that case it would be sufficient to check which version, if any, has enough votes to be activated, otherwise stick with the previous one.
Maybe that's too restrictive, but I doubt we'll really support running arbitrary code versions with each other, it's just too much testing effort. It's less about this code and more about "do we want to support this at all?".
Now that I see #5766 this PR should likely be marked as draft … |
My intention with this PR was just to show joshisan more or less how easy/hard would it be to implement the whole thing and get some feedback.
This one. Plus possibly we could have a function on a module that we would call when version changes, to free the module from checking it on every |
Makes sense!
That would be kinda redundant. If such a separate upgrade fn was called I'd want to track that in the module and reject any calls to |
One more thing that came to my mind: AlephBFT upgrades are generally major core consensus version bumps, if we want to allow smooth upgrades we'll have to ship two versions of AlephBFT for one Fedimint version. |
#5766 is/will be better. |
Summary
Track core modules effective consensus versions, starting with the one from config ("negotiated" aka. hardcoded during DKG), and allow upgrading it via new consensus items.
On start peers submit their votes when the existing ones are different then what the current core/module supports. Peers track these votes and upgrade when certain conditions are met (details still WIP).
The benefit here is that module authors can relatively easily change details of consensus logic, make the new logic conditional on the consensus level, and everything works seamlessly - existing history can be reprocessed from scratch using old logic, while new code will trigger after consensus version was upgraded. This avoid hacks and other shenanigans that we had to employ e.g. to fix RBF peg-out bug.
When current consensus version is upgraded by guardian consensus is outside of what is a given non-upgraded can support, peer will just crash, forcing leftover peers to upgrade. In the future we could make it more elegant.
Currently the goal is to just enable this functionality and start tracking versions, without using it for anything, just in case we hit any new bugs that force us to do upgrades.
Status
Just wrote it down and pushed, just to show it around.
threshold
votes is required to switch the version, but seems like we would actually prefer automatic upgrade when all peers are ready (for best availability), with a possibility of a manual trigger to downgrade it tothreshold
- in cases where the Federation knows and is OK with certain peers going down (e.g. urgent security fix).