-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Allow forcing processing an epoch #2318
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2318 /- ##
==========================================
- Coverage 59.87% 58.74% -1.14%
==========================================
Files 158 159 1
Lines 33438 34224 786
==========================================
Hits 20022 20105 83
- Misses 13416 14119 703
... and 37 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
LGTM How will the flow look from the cli? Do we need another PR with |
async fn force_process_epoch(&mut self, outcome: EpochOutcome) { | ||
let convert = ConsensusOutcomeConversion::from(outcome).0; | ||
match self.process_outcome(convert).await { | ||
Ok(_) => {} | ||
Err(err) => warn!("Unable to force process epoch {:?}", err), | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need to tell HBBFT that it can accept the new epoch now and do we need to ask peers to trigger more epochs to rejoin? Or is it assumed that after that API call the guardian node has to be restarted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized we already handle that here https://github.com/fedimint/fedimint/blob/master/fedimint-server/src/consensus/server.rs#L334-L335
If an epoch is in the past, it won't be processed, if it's in the future we'll download missing epochs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fixes the problem, this comment wasn't about downloading old epochs.
Let's suppose there f
servers are permanently lost and t
(A, B, C) are trying to restart but one of them (A) is at a wrong epoch:
- A, B, C start HBBFT at their respective last epochs
- A, B, C send rejoin requests and generate HBBFT messages
- A sends message for epoch
n
- B, C send messages for epoch
n 1
- No consensus is found on the epoch, A stays on epoch
n
- A sends message for epoch
- No progress is made since only 2 of the required 3 are sending contributions for the correct epoch
- A forces the processing of epoch
n 1
, but does not tell HBBFT about it, so no new messages are generated for epochn 1
and HBBFT remains stuck
I think this is solved by restarting fedimintd
afterwards, but ideally there'd be a more elegant solution.
integrationtests/tests/tests.rs
Outdated
bitcoin.mine_blocks(100).await; | ||
fed.run_consensus_epochs(1).await; | ||
|
||
// We cannot process a past each and reverse the block height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean epoch
?
async fn force_process_epoch(&mut self, outcome: EpochOutcome) { | ||
let convert = ConsensusOutcomeConversion::from(outcome).0; | ||
match self.process_outcome(convert).await { | ||
Ok(_) => {} | ||
Err(err) => warn!("Unable to force process epoch {:?}", err), | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fixes the problem, this comment wasn't about downloading old epochs.
Let's suppose there f
servers are permanently lost and t
(A, B, C) are trying to restart but one of them (A) is at a wrong epoch:
- A, B, C start HBBFT at their respective last epochs
- A, B, C send rejoin requests and generate HBBFT messages
- A sends message for epoch
n
- B, C send messages for epoch
n 1
- No consensus is found on the epoch, A stays on epoch
n
- A sends message for epoch
- No progress is made since only 2 of the required 3 are sending contributions for the correct epoch
- A forces the processing of epoch
n 1
, but does not tell HBBFT about it, so no new messages are generated for epochn 1
and HBBFT remains stuck
I think this is solved by restarting fedimintd
afterwards, but ideally there'd be a more elegant solution.
@elsirion Right, forgot about that point. See updated code. I need a way to simulate a server restart more easily I think. |
I tried running on this branch:
Was this supposed to work? |
@douglaz I've switch to using env vars instead of params to make it easier to call the admin API. Try setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now :)
Fixes #2309
Note we upload the entire serialized epoch (which can be downloaded from the API), because it's both easier to get from the API and doesn't require additional P2P calls (we don't really know who to query).