[Github Actions] Add new workflows for fullnode syncing. #6352

JoshLind · 2023-01-26T23:37:21Z

Description

This PR adds nightly Github action workflows to verify fullnode health. Specifically, we add the following workflows:

Start a node using the main branch, connect it to devnet and verify it can sync using execution syncing.
Start a node using the main branch, connect it to testnet and verify it can sync using fast syncing.
Start a node using the main branch, connect it to mainnet and verify it can sync using output syncing.
Start a node using the main branch, connect it to mainnet and verify it can sync using fast syncing.
Start a node using the main branch, connect it to devnet and verify it can sync using intelligent syncing mode.

The workflows will spawn and monitor the node and wait for it to sync to a target version (where the target is the latest version of the blockchain plus some delta). If this fails, the workflow will fail.

We'll likely add more of these tests in the future, but let's see how these do.

Test Plan

Manual verification.

rustielin

Awesome work! Generally LGTM, but some nits on the structure of the composite action, as well as to split out the main action to a separate script

rustielin · 2023-01-27T22:30:14Z

.github/actions/fullnode-sync/action.yaml

@@ -0,0  1,85 @@
+# This action runs a public fullnode using a specified branch ($BRANCH),


nit: I would make the environment variables BRANCH, NETWORK, BOOTSTRAPPING_MODE, etc into action inputs, which makes it slightly more modular and clearer to the user what's required. Also, it makes it easier later on to move all these actions to a separate repo! Here's an example (from me moving out the replay-verify job) : https://github.com/aptos-labs/actions/blob/0a510a8f270c9b5093c3e79043f4613dbb07aeb3/replay-verify/action.yml

rustielin · 2023-01-31T22:08:07Z

.github/workflows/fullnode-execute-devnet-main.yaml

+      GIT_SHA:
+        required: false
+        type: string
+        description: The git SHA1 to test. If not specified, Forge will check the latest commits on the current branch


this input is not actually used -- can you remove it?

Copy/pasta magic that I thought might be used. Removed from them all. 😄

JoshLind · 2023-01-31T22:10:50Z

Thanks @rustielin! Made a few tweaks:

Introduced a new python script, which is a bit more readable and easier to manager.
Added some features, e.g., verifying the process is still running, handling timeouts if the node appears to be stuck, measuring syncing throughput, etc.
Added composite inputs and a README, like you did: https://github.com/aptos-labs/actions/pull/3/files
Hopefully, once your PR lands, it won't be too hard to migrate this across.

rustielin · 2023-01-31T22:12:22Z

.github/actions/fullnode-sync/fullnode_sync.py

+  # Clone the aptos-networks repository at the same place as aptos-core
+  current_working_directory = os.getcwd()
+  os.chdir("..")
+  subprocess.run(["git", "clone", "https://github.com/aptos-labs/aptos-networks"])


nit: instead of git clone, you can prob just get it from the raw paths here: e.g. https://raw.githubusercontent.com/aptos-labs/aptos-networks/main/mainnet/waypoint.txt
That way, you can get around having to do git magic and chdir ..

Nice! So much simpler 😄

rustielin · 2023-01-31T22:16:11Z

.github/actions/fullnode-sync/fullnode_sync.py

+def ping_rest_api_index_page(rest_endpoint, exit_if_none):
+  """Pings and returns the index page from a REST API endpoint"""
+  # Ping the REST API index page
+  process = subprocess.Popen(["curl", rest_endpoint], stdout=subprocess.PIPE, stderr=subprocess.PIPE)


instead of curl, probably use https://pypi.org/project/requests/, or at least do curl -s

Looks like job output gets noisy with

b'' Found output on stderr for ping_rest_api_index_page: b' % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 221 100 221 0 0 215k 0 --:--:-- --:--:-- --:--:-- 215k\n'

Nice, thanks! Just gone for curl -s for now 😄

rustielin

Works as intended and LGTM! Just a few nits, but stamping to unblock. Let's get this landed as soon as possible!

bchocho

This is incredible. I saw the max time is 5 hours. How long does the sync take today? In future could this be the basis of a regression test?

JoshLind · 2023-02-01T02:11:52Z

This is incredible. I saw the max time is 5 hours. How long does the sync take today? In future could this be the basis of a regression test?

They are currently a mixture, e.g., 25 mins to 2 hours. I think in the future, we could make the shortest test land blocking if we wanted to, e.g., devnet fast sync :) But, I'd like these to stabilize first.

github-actions · 2023-02-01T03:00:49Z

✅ Forge suite `land_blocking` success on `6243dadcc84597e9d4581279025fd9b8b04b4d5d`

performance benchmark with full nodes : 5960 TPS, 6640 ms latency, 12300 ms p99 latency,(!) expired 720 out of 2545760 txns
Test Ok

github-actions · 2023-02-01T03:02:53Z

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `6243dadcc84597e9d4581279025fd9b8b04b4d5d`

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 6243dadcc84597e9d4581279025fd9b8b04b4d5d (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7786 TPS, 4934 ms latency, 6600 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 6243dadcc84597e9d4581279025fd9b8b04b4d5d
compatibility::simple-validator-upgrade::single-validator-upgrade : 4850 TPS, 8344 ms latency, 11600 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 6243dadcc84597e9d4581279025fd9b8b04b4d5d
compatibility::simple-validator-upgrade::half-validator-upgrade : 4362 TPS, 9131 ms latency, 13000 ms p99 latency,no expired txns
4. upgrading second batch to new version: 6243dadcc84597e9d4581279025fd9b8b04b4d5d
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6504 TPS, 5762 ms latency, 9200 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 6243dadcc84597e9d4581279025fd9b8b04b4d5d passed
Test Ok

JoshLind requested a review from rustielin January 26, 2023 23:37

JoshLind requested a review from a team as a code owner January 26, 2023 23:37

JoshLind force-pushed the main_fullnode_sync_mainnet branch 12 times, most recently from 2437188 to 8916a3c Compare January 27, 2023 04:03

JoshLind changed the title ~~[Github Actions] Add a new workflow for fullnode syncing.~~ [Github Actions] Add new workflows for fullnode syncing. Jan 27, 2023

JoshLind force-pushed the main_fullnode_sync_mainnet branch 14 times, most recently from b2fa21d to 5bf673b Compare January 27, 2023 22:15

rustielin requested changes Jan 27, 2023

View reviewed changes

JoshLind force-pushed the main_fullnode_sync_mainnet branch 2 times, most recently from 15986bc to 158f45c Compare January 31, 2023 21:49

JoshLind requested a review from rustielin January 31, 2023 21:57

JoshLind force-pushed the main_fullnode_sync_mainnet branch from 158f45c to 052a4bc Compare January 31, 2023 21:58

rustielin reviewed Jan 31, 2023

View reviewed changes

rustielin approved these changes Jan 31, 2023

View reviewed changes

JoshLind force-pushed the main_fullnode_sync_mainnet branch 2 times, most recently from 94035ae to 58cbbb1 Compare January 31, 2023 22:37

rustielin requested a review from a team January 31, 2023 23:13

JoshLind force-pushed the main_fullnode_sync_mainnet branch from 58cbbb1 to 3c3ee99 Compare February 1, 2023 00:11

JoshLind requested review from geekflyer, igor-aptos, msmouse, perryjrandall, sitalkedia and grao1991 February 1, 2023 00:30

bchocho approved these changes Feb 1, 2023

View reviewed changes

[Github Actions] Add a new workflow for fullnode syncing.

6243dad

JoshLind force-pushed the main_fullnode_sync_mainnet branch from 3c3ee99 to 6243dad Compare February 1, 2023 02:09

JoshLind enabled auto-merge (rebase) February 1, 2023 02:09

This comment has been minimized.

Sign in to view

JoshLind merged commit 215b629 into main Feb 1, 2023

JoshLind deleted the main_fullnode_sync_mainnet branch February 1, 2023 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Github Actions] Add new workflows for fullnode syncing. #6352

[Github Actions] Add new workflows for fullnode syncing. #6352

JoshLind commented Jan 26, 2023 •

edited

Loading

rustielin left a comment

rustielin Jan 27, 2023

rustielin Jan 31, 2023

JoshLind Jan 31, 2023

JoshLind commented Jan 31, 2023

rustielin Jan 31, 2023

JoshLind Jan 31, 2023

rustielin Jan 31, 2023

JoshLind Jan 31, 2023

rustielin left a comment

bchocho left a comment

JoshLind commented Feb 1, 2023

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

		@@ -0,0 1,85 @@
		# This action runs a public fullnode using a specified branch ($BRANCH),

[Github Actions] Add new workflows for fullnode syncing. #6352

[Github Actions] Add new workflows for fullnode syncing. #6352

Conversation

JoshLind commented Jan 26, 2023 • edited Loading

Description

Test Plan

rustielin left a comment

Choose a reason for hiding this comment

rustielin Jan 27, 2023

Choose a reason for hiding this comment

rustielin Jan 31, 2023

Choose a reason for hiding this comment

JoshLind Jan 31, 2023

Choose a reason for hiding this comment

JoshLind commented Jan 31, 2023

rustielin Jan 31, 2023

Choose a reason for hiding this comment

JoshLind Jan 31, 2023

Choose a reason for hiding this comment

rustielin Jan 31, 2023

Choose a reason for hiding this comment

JoshLind Jan 31, 2023

Choose a reason for hiding this comment

rustielin left a comment

Choose a reason for hiding this comment

bchocho left a comment

Choose a reason for hiding this comment

JoshLind commented Feb 1, 2023

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Feb 1, 2023

✅ Forge suite land_blocking success on 6243dadcc84597e9d4581279025fd9b8b04b4d5d

github-actions bot commented Feb 1, 2023

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 6243dadcc84597e9d4581279025fd9b8b04b4d5d

JoshLind commented Jan 26, 2023 •

edited

Loading

✅ Forge suite `land_blocking` success on `6243dadcc84597e9d4581279025fd9b8b04b4d5d`

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `6243dadcc84597e9d4581279025fd9b8b04b4d5d`