Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove active gateway #4427

Merged
merged 5 commits into from
Mar 12, 2024

Conversation

m1sterc001guy
Copy link
Contributor

@m1sterc001guy m1sterc001guy commented Feb 27, 2024

Removes the "active gateway" concept from the client. Makes our tests more deterministic since every lightning operation explicitly accepts a gateway as a parameter. If a gateway is not supplied, it is assumed to be an internal payment.

  • Adds a gateway cache
  • Higher level apps like the CLI select a gateway from the cache to use when paying or creating an invoice.
  • Cache can be updated using update_gateway_cache. Higher level apps can call this periodically to stay in sync with gateway registrations.
  • Adds UpdateGatewayCache CLI command
  • Changes devimint tests to be backwards compatible.
  • Adds DB migration to remove active gateway entry

Closes #3844

@m1sterc001guy m1sterc001guy force-pushed the remove_active_gateway branch 2 times, most recently from b4aea41 to f5137e1 Compare February 29, 2024 16:50
@m1sterc001guy
Copy link
Contributor Author

Two problems make this a tad difficult:

  1. Devimint tests need to be made backwards compatible.
  2. Removing "active gateway" effectively removes caching of the Lightning Gateways, which regresses the latency tests. Perhaps a better approach is to change "active gateway" into a gateway cache, but still force the cli to pass a gateway id.

@elsirion
Copy link
Contributor

elsirion commented Mar 1, 2024

2. gateway cache

Great idea!

Devimint tests need to be made backwards compatible

Could you encapsulate the pay command in a function that, depending on the client version, calls set active gateway pay or just pay with a specific gateway as argument?

.into_iter()
.filter(|g| g.vetted)
.collect::<Vec<_>>();
if !vetted.is_empty() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question that arises is how we want to handle the "vetted" field. Previously we were filtering non-vetted gateways when selecting the active gateway. Perhaps we should just leave it up to the integrating client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think vetted gateways should be prioritized if provided and online, if not we should try another one. That would mean the "ping the GW before making a payment" logic would move into the selection function so we can skip offline vetted gateways.

This might lead to the "gateway cache" becoming its own service in the background that autonomously fetches available gateways and tries to connect to them to figure out which one to use when asked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take this another direction if you like, but I opted to keep the API at the fedimint-client level as simple as possible. Lightning operations always expect a gateway to be passed in and callers can acquire this gateway either using select_gateway or implementing their own logic in combination with list_gateways (they can select vetted gateways this way).

The goal of this PR was to remove non-determinism from our tests and I think implementing a fallback gateway causes some issues (it was the source of #4350 (comment)), since the client might unexpectedly choose a different gateway.

I think higher level apps can implement this fallback or select vetted gateways if they want, but the way it is now allows our tests to be more straightforward

@m1sterc001guy m1sterc001guy force-pushed the remove_active_gateway branch 9 times, most recently from 74f6c6f to 900598c Compare March 6, 2024 16:03
@m1sterc001guy m1sterc001guy changed the title [WIP]: Remove active gateway Remove active gateway Mar 6, 2024
@m1sterc001guy m1sterc001guy changed the title Remove active gateway feat: Remove active gateway Mar 6, 2024
@m1sterc001guy m1sterc001guy marked this pull request as ready for review March 6, 2024 16:35
@m1sterc001guy m1sterc001guy requested review from a team as code owners March 6, 2024 16:35
@m1sterc001guy m1sterc001guy force-pushed the remove_active_gateway branch from 900598c to 99d296a Compare March 9, 2024 22:25
@m1sterc001guy m1sterc001guy force-pushed the remove_active_gateway branch from 99d296a to 69a07d5 Compare March 10, 2024 17:06
Comment on lines 55 to 56
std::env::set_var("GWID_CLN", gw_cln.gateway_id().await?);
std::env::set_var("GWID_LND", gw_lnd.gateway_id().await?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to suggest prefixing with FM_ but then realized that the only reason we did this in the past was to be able to copy all relevant envirment variables into the mprocs shell IIRC. But devimint fixes this for us now. So it would probably make more sense to remove the FM_ prefixes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have project-specfic prefixes in long living env variables, as it makes chances of accidental conflicts lower. So we should keep LM_. @m1sterc001guy @justinmoon

justinmoon
justinmoon previously approved these changes Mar 12, 2024
Copy link
Contributor

@justinmoon justinmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed pretty thoroughly and it looks good. Only question is if we should delete the active gateway db key from code since we removed from the db in a migration.

@justinmoon justinmoon added this pull request to the merge queue Mar 12, 2024
Merged via the queue into fedimint:master with commit 043f6c3 Mar 12, 2024
20 checks passed
Copy link
Contributor

@elsirion elsirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I never submitted these comments 🙈

modules/fedimint-ln-client/src/db.rs Show resolved Hide resolved
Comment on lines 16 to 17
// Lightning Gateway cache records are stored the same way as the server, using 0x45
// [`fedimint_ln_common::db::DbKeyPrefix::LightningGateway`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why?
  2. Can't we define the variant here too? This seems to break the invariant that every DB key used is documented using this enum (the comment helps, but might be overlooked if e.g. looking at it on docs.rs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mainly to re-use the existing code, since we already had a struct definition that we needed. I understand the concern though, we should probably move db.rs in fedimint-ln-common into the server and only use the db records from within the respective crates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: #4567

Comment on lines 851 to 895
/// Returns all gateways that are currently in the gateway cache.
pub async fn list_gateways(&self) -> Vec<LightningGatewayAnnouncement> {
let mut dbtx = self.client_ctx.module_db().begin_transaction_nc().await;
dbtx.find_by_prefix(&LightningGatewayKeyPrefix)
.await
.map(|(_, gw)| gw.unanchor())
.collect::<Vec<_>>()
.await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect some sort of cache invalidation here, e.g. re-fetch if any gateway timed out or if we don't have any gateways cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the API as simple as possible, with no unexpected network activity. I think higher level apps can make this decision themselves if they see that the cache is empty, they can call update_gateway_cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove "active gateway" concept
4 participants