Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(consensus): consensus version votes #5568

Closed
wants to merge 1 commit into from

Conversation

dpc
Copy link
Contributor

@dpc dpc commented Jul 2, 2024

Summary

Track core modules effective consensus versions, starting with the one from config ("negotiated" aka. hardcoded during DKG), and allow upgrading it via new consensus items.

On start peers submit their votes when the existing ones are different then what the current core/module supports. Peers track these votes and upgrade when certain conditions are met (details still WIP).

The benefit here is that module authors can relatively easily change details of consensus logic, make the new logic conditional on the consensus level, and everything works seamlessly - existing history can be reprocessed from scratch using old logic, while new code will trigger after consensus version was upgraded. This avoid hacks and other shenanigans that we had to employ e.g. to fix RBF peg-out bug.

When current consensus version is upgraded by guardian consensus is outside of what is a given non-upgraded can support, peer will just crash, forcing leftover peers to upgrade. In the future we could make it more elegant.

Currently the goal is to just enable this functionality and start tracking versions, without using it for anything, just in case we hit any new bugs that force us to do upgrades.

Status

Just wrote it down and pushed, just to show it around.

  • Needs a round of my own review, and then lots of other reviews.
  • Needs tests, which can be a bit tricky because we need to trigger code changes that leads to version updates.
  • Currently threshold votes is required to switch the version, but seems like we would actually prefer automatic upgrade when all peers are ready (for best availability), with a possibility of a manual trigger to downgrade it to threshold - in cases where the Federation knows and is OK with certain peers going down (e.g. urgent security fix).

@dpc dpc requested review from a team as code owners July 2, 2024 22:57
@@ -425,9 404,18 @@ impl Fedimintd {
std::process::exit(-1);
}

#[allow(unused, clippy::unused_self)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used only for fedimintd dev ... commmands and needs an extra fix, so left it broken for now.

@@ -508,6 499,27 @@ async fn run(
Default::default(),
);

// TODO: fixme
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@dpc
Copy link
Contributor Author

dpc commented Jul 2, 2024

@joschisan So this would be the damage. :D

@dpc dpc force-pushed the 24-06-29-consensus-versioning branch from 1c406a1 to a4c9f07 Compare July 2, 2024 23:08
@Kodylow
Copy link
Member

Kodylow commented Jul 8, 2024

from call: can we combine this with updating/voting on new consensus config?

@dpc
Copy link
Contributor Author

dpc commented Jul 8, 2024

from call: can we combine this with updating/voting on new consensus config?

I thought about it, and in my opinion these two can and should be done as a separate changes, where consensus version votes actually make rolling out config changes feature better (just like any other consensus-related changes), by making it possible to automatically agree on a consensus logic changes in a backward-compat way.

Copy link
Contributor

@elsirion elsirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What isn't clear to me right now: how are modules told about which consensus version to use? It looks like the ConsensusVersionCache isn't being read yet? I see two options:

  • Give the consensus version to the ServerModuleInit::init function. This would likely require a restart on version change.
  • Supply the current version on every call to process_{input,output,consensus_item}

Comment on lines 148 to 152
// Minor module consensus version is irrelevant to the client, and
// can change at runtime but since it's already here, we just
// hardcode it to 0. In theory we could read it at runtime for
// information purposes.
minor: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because any new behavior would also result in a new API version?

Copy link
Contributor Author

@dpc dpc Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I being to think that we should just not expose consensus version to client altogether. If consensus version change changes the API, it can be incremented in lockstep in the implementation.

Comment on lines 120 to 124
#[derive(Copy, Clone, Debug, Encodable, Decodable, PartialEq, Eq)]
pub struct ConsensusVersionVoteValue {
pub major: u32,
pub minor: u32,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this type become the basis for CoreConsensusVersion and ModuleConsensusVersion?

Comment on lines 1084 to 1100
/// In memory copy of all the current consensus versions that are maintained in
/// the DB.
///
/// To avoid reading from the database all the time, invalidation is supported.
/// Any time consensus item that could have changed it is processed, the whole
/// thing will get invalidated and reloaded at the beginning for processing new
/// consensus item.
///
/// Create by using [`ConsensusEngine::load_consensus_version_cache`]
#[derive(Debug)]
struct ConsensusVersionCache {
// Currently nothing is using it, but in the future we can pass these
// to modules.
#[allow(unused)]
versions: BTreeMap<Option<ModuleInstanceId>, ConsensusVersionValue>,
invalidated: bool,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalidation logic sounds overly complicated. Why not make it a write-through cache that is used for all writes of consensus versions to the DB? One way or another we can mess it up, either by forgetting to invalidate the cache or by writing to the DB directly. But not doing direct writes sounds most natural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that invalidatioon here is a simplification. There's only one way and place that can change a version - when the consensus version vote is processed. A write-through cache sounds roughly equivalent to me, except it would probably be a field on a struct, and not a variable.

Comment on lines 646 to 673
fn get_desired_consensus_version_from_votes_static(
num_peers: NumPeers,
votes: impl IntoIterator<Item = ConsensusVersionVoteValue>,
) -> Option<ConsensusVersionValue> {
let mut votes_by_major: BTreeMap<u32, BTreeMap<u32, usize>> = Default::default();
for ConsensusVersionVoteValue { major, minor } in votes {
*votes_by_major
.entry(major)
.or_default()
.entry(minor)
.or_default() = 1;
}

let threshold_needed = num_peers.threshold();

for (major, minor_counts) in votes_by_major {
let mut threshold_accumulated = 0;
for (minor, count) in minor_counts.into_iter().rev() {
threshold_accumulated = count;

if threshold_needed <= threshold_accumulated {
return Some(ConsensusVersionValue { major, minor });
}
}
}

None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version unification seems overly complex. Ideally everyone runs the same software version, if a federation suddenly runs 3 different versions (where this "least common denominator" logic would kick in) something is horribly wrong imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally everyone runs the same software version

I do not understand. This code needs to support switching between any versions, including major version switches if we ever have many, for the purpose of rewinding consensus from any past versions.

20 LoC is not much, IMO, and I moved it to a separate function so testing it thoroughly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... Or should we just use major version to mean v2 in lnv2, and not support ever upgrading major consensus version in the running federations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that you should never have a situation where you run versions 1.0, 1.1, 1.2 and 1.1 together. If an upgrade happens that should work as follows:

At first everyone is at 1.0, then everyone upgrades to e.g. 1.1 and once enough updated they vote on 1.1. In that case it would be sufficient to check which version, if any, has enough votes to be activated, otherwise stick with the previous one.

Maybe that's too restrictive, but I doubt we'll really support running arbitrary code versions with each other, it's just too much testing effort. It's less about this code and more about "do we want to support this at all?".

@elsirion
Copy link
Contributor

Now that I see #5766 this PR should likely be marked as draft …

@elsirion elsirion marked this pull request as draft July 29, 2024 15:48
@dpc
Copy link
Contributor Author

dpc commented Aug 5, 2024

@elsirion

What isn't clear to me right now: how are modules told about which consensus version to use?

My intention with this PR was just to show joshisan more or less how easy/hard would it be to implement the whole thing and get some feedback.

Supply the current version on every call to process_{input,output,consensus_item}

This one.

Plus possibly we could have a function on a module that we would call when version changes, to free the module from checking it on every process_x call if it wants to run some one-off upgrade logic.

@elsirion
Copy link
Contributor

My intention with this PR was just to show joshisan more or less how easy/hard would it be to implement the whole thing and get some feedback.

Makes sense!

Supply the current version on every call to process_{input,output,consensus_item}

This one.

Plus possibly we could have a function on a module that we would call when version changes, to free the module from checking it on every process_x call if it wants to run some one-off upgrade logic.

That would be kinda redundant. If such a separate upgrade fn was called I'd want to track that in the module and reject any calls to process_* that use older versions. At that point we could just use the version tracked by the module struct.

@elsirion
Copy link
Contributor

One more thing that came to my mind: AlephBFT upgrades are generally major core consensus version bumps, if we want to allow smooth upgrades we'll have to ship two versions of AlephBFT for one Fedimint version.

@dpc
Copy link
Contributor Author

dpc commented Aug 28, 2024

#5766 is/will be better.

@dpc dpc closed this Aug 28, 2024
@dpc dpc deleted the 24-06-29-consensus-versioning branch November 12, 2024 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consensus Version Upgrade Handling
3 participants