feat: rearchitect rebaser to support more scale #4638

fnichol · 2024-09-19T23:41:17Z

A rather large change which changes how a Rebaser server consumes
requests for change sets and when it performs related "dependent values
update" (aka "DVU") runs.

General NATS Jetstream Architecture

A Rebaser server uses 2 NATS Jetstream streams to track its work:

REBASER_REQUESTS: as before all requests are consumed from this
stream and can be thought of as a work queue. In this implementation,
the stream is a limits-based stream, as the Jetstream consumer model
is different. As before a request for workspace $wk_id and change
set $cs_id are published on rebaser.requests.$wk_id.$cs_id.
REBASER_TASKS: this is a classic Jetstream work queue stream, but
where there is only one message per subject and where each message
represents an exclusive task for a single Rebaser to spin up and run.
In this case, there are only "process" tasks in which a Rebaser will
consume the enqueued requests for a change set and run DVUs in serial
(that is, only one at a time and one after the other as needed). When
a request message is sent to the REBASER_REQUESTS stream, another
body-less message is sent to this tasks stream. These messages are
published on rebaser.tasks.$wk_id.$cs_id.process.

Tasks Stream

On the tasks stream, a single Jetstream consumer is setup to share work
accross multiple Rebasers as clients. When a Rebaser starts to process a
task message it continuously sends AckKind::Progress messages to keep
the message from being ack'd or redelivered. If the task encounters an
error, it can return a Result::Err(_) from its Naxum handler which
will trigger an AckKind::Nack(None) message, causing an immediate
redlivery of the message to another Rebaser.

If a task needs to be interrupted due to a graceful shutdown of a Rebaser
server, the handler it will return an Error::Interupted(_) error which
ensures that the message is nack'd and will be redelivered. If a task
can be cleanly completed, the the handler will return a Result::Ok(_)
which triggers an AckKind::Ack, causing the message to be deleted from
the tasks stream and so will not be ran again.

Requests Stream

When a "process" task is running, a dedicated NATS consumer is created
to exclusively process all requests for a change set in serial (that is,
one at a time). This consumer is known as an "ordered consumer" and is
push-based (rather than the default pull-based consumers). An ordered
consumer is much lighter weight and ephemeral as far as a NATS cluster
is concerned and thus should reduce the stress on NATS when many change
sets are created/active over a short period of time.

This ordered consumer is set up with a timeout that detects when no
message has entered the subject (or no message has been pulled into the
Naxum app) within a period. When this "quiescence" period is seen, this
triggers a specific graceful shutdown of the "process" task where its
exit state is to return Result::Ok(_) and ack the task message. In
this way, change sets which become inactive (that is, no Rebaser request
message) are spun down to conserve resources and allow the Rebaser to
focus only on "active" change sets.

Using an ordered consumer means that we can no longer use a message
ack to delete a processed messages (this is a trick of a work queue
stream). Therefore when a request has been successfully processed, the
message is deleted from the stream using its sequence number. A new
Naxum middleware called PostProcess provides us a way to handle this
delete on an OnSuccess callback. In the OnFailure case we simply
don't delete the message which means the next message is still the first
and only message to process.

Another tracked Tokio task is running alongside this consuming requests
task, called the SerialDvuTask. It is waiting for a
tokio::sync::Notify to fire which is triggered by the
request-consuming Tokio task. If, during the run of a DVU, another
request is processed that requires another DVU run, then the Notify
will be re-enabled. That way when the SerialDvuTask loop comes back to
check, the Notify will be set thus trigger "yet one more" DVU run.

Other Structural Changes

Among other changes, some of note are:

The publisher of a Rebaser request send it directly to the
REBASER_REQUESTS stream. Due to its current RPC (i.e.
request-then-await-response) communication pattern, setting up the
Rebaser in this way makes it operate more like Pinga and Veritech.
Future work is likely going to bring back the concept of "activities"
that flow for interested parties to follow, however prior to this
commit, the only users of the activity stream were callers to the
Rebaser and the Rebaser server itself.
A first class client for the Rebaser is provided in
lib/rebaser-client which abstracts the NATS communication and the
on-the-wire request formatting details, ideally allowing us to evolve
the Rebaser's messaging internally where possible.
The start of an API messaging framework is provided largely in the
lib/rebaser-core crate in the api_types module. Several Rust
traits are provided that when implemented gives the user a versioned
message that can be upgraded and understands its serialization format
independently of versioning. This should allow us to evolve the
Rebasers request and response messages with both the client and server
able to detect unsupported message versions without having to
deserialize the message payload itself. Big props to @jhelwig's work
on version graph snapshots for the inspiration.
Rebaser request and response messages are transmitted with several
metadata header in the NATS message including:
- X-CONTENT-TYPE: describes the serialization format in a content
  type/MIME header compatible string. Current values are
  application/json and application/cbor with application/cbor
  the current default.
- X-MESSAGE-TYPE: describes the message type, typically the primary
  Rust name. Current values are EnqueueUpdatesRequest and
  EnqueueUpdatesResponse.
- X-MESSAGE-VERSION: a positive integer, where a greater number
  value implies a newer version. It is likely that code that would
  understand version 5 of a message may be able to understand version
  4, 3, 2, 1 but not 6 as an example.
- NATS-MSG-ID: this is a standard NATS header which can be used for
  message de-duplication and is populated with a random
  at-request-time Ulid value, managed via a RequestId type
  (required for all API data types).

fnichol · 2024-09-19T23:52:06Z

/try

fnichol · 2024-09-21T06:34:57Z

/try

nickgerace

Looks great to me! I like the naxum-spawned-by-naxum work and it's easy to follow. Clever use of the tasks stream basically being held locks to change sets.

nickgerace · 2024-09-21T19:13:40Z

lib/dal-test/src/helpers/change_set.rs

+        if has_roots {
+            loop {
+                let mut ctx_clone = ctx.clone();
+                ctx_clone.update_snapshot_to_visibility().await?;
+                if !ctx_clone
+                    .workspace_snapshot()?
+                    .has_dependent_value_roots()
+                    .await?
+                {
+                    break;
+                }
+                tokio::time::sleep(Duration::from_millis(25)).await;
+            }
+        }


Normally I'd be weary about the potential infinite loop, but this is a test helper, so looks great to me!

lib/dal/src/context.rs

nickgerace · 2024-09-21T19:20:42Z

lib/naxum/src/middleware/post_process/service.rs

+    fn call(&mut self, req: jetstream::Message) -> Self::Future {
+        let parts = req.into_parts();
+        let head = Arc::new(parts.0.clone());
+        let message = match <jetstream::Message as MessageHead>::from_parts(parts.0, parts.1) {
+            Ok(message) => message,
+            Err(err) => unreachable!(
+                "NATS Jetstream message from parts should succeed, this is a bug!; error={:?}",
+                err
+            ),
+        };
+
+        let info = Arc::new(Info::from(
+            // TODO(fnichol): the middleware here is infallible, but this call could, in theory
+            // error. There's probably a better alternative here...
+            message.info().expect("failed to parse message info"),
+        ));


These look like they'd be hit by logic errors caught by local dev and tests. I'm okay with panics here.

I may be able to deal with this differently

lib/rebaser-client/src/lib.rs

nickgerace · 2024-09-21T19:25:00Z

lib/rebaser-client/src/lib.rs

+        // There is one more optional future here which is confirmation from the NATS server that
+        // our publish was acked. However, the task stream will drop new messages that are
+        // duplicates and this returns an error on the "ack future". Instead, we'll keep this as
+        // fire and forget.
+        self.context.publish(tasks_subject, vec![].into()).await?;


Looks good to me.

nickgerace · 2024-09-21T19:35:31Z

lib/rebaser-server/src/change_set_processor_task.rs

+        // TODO(fnichol): hrm, is this *really* true that we've written to the change set. I mean,
+        // yes but until a dvu has finished this is an incomplete view?
+        let mut event = WsEvent::change_set_written(&ctx, change_set_id.into()).await?;
+        event.set_workspace_pk(workspace_id.into());
+        event.set_change_set_id(Some(change_set_id.into()));
+        event.publish_immediately(&ctx).await?;


Smells like a "fuck it we ball, reload that UI" trigger. I'm okay with it for now, but you're right... not necessarily going to be true. Good to call out!

nickgerace · 2024-09-21T19:37:19Z

lib/rebaser-server/src/handlers.rs

+) -> push::OrderedConfig {
+    push::OrderedConfig {


I saw this yesterday and am a bit surprised it comes from push and not pull... just an interesting note lol

Yes, we were suggested that a push consumer would be okay to use here (maybe slightly preferred in our use case). My understanding is that under the covers it uses a core NATS subject, fires all messages that you're subscribed to there and set the client up to subscribe to the single core NATS subject (so old-school NATS firehose style). Kind of neat!

lib/rebaser-server/src/lib.rs

lib/rebaser-server/src/middleware.rs

nickgerace · 2024-09-21T19:38:55Z

lib/rebaser-server/src/rebase.rs

    debug!(
-        "to_rebase_address: {}, rebase_batch_address: {}",
-        to_rebase_workspace_snapshot_address, message.payload.rebase_batch_address
+        to_rebase_workspace_snapshot_address = %to_rebase_workspace_snapshot_address,
+        updates_address = %request.updates_address,
    );


Thank you! These fields will be easier to index in Jaegar.

This change resolves an issue where customizing the `OnSuccess` or `OnFailure` callbacks would result in a compiler error. Whoops! Signed-off-by: Fletcher Nichol <[email protected]>

This fix came up while trying to use `tokio-stream`'s `.timeout()` combinator which is not `Unpin`. Signed-off-by: Fletcher Nichol <[email protected]>

Signed-off-by: Fletcher Nichol <[email protected]>

This new middleware works when processing Jetstream messages and is useful in situations where `ack`'ing messages isn't possible or appropriate. The callback for `OnSuccess` and `OnFailure` have access to the message's `Head` metadata as well as its `Info` metadata. Signed-off-by: Fletcher Nichol <[email protected]>

Signed-off-by: Fletcher Nichol <[email protected]>

fnichol · 2024-09-22T01:09:47Z

/try

@jhelwig

A rather large change which changes how a Rebaser server consumes requests for change sets and when it performs related "dependent values update" (aka "DVU") runs. General NATS Jetstream Architecture ----------------------------------- A Rebaser server uses 2 NATS Jetstream streams to track its work: - `REBASER_REQUESTS`: as before all requests are consumed from this stream and can be thought of as a work queue. In this implementation, the stream is a limits-based stream, as the Jetstream consumer model is different. As before a request for workspace `$wk_id` and change set `$cs_id` are published on `rebaser.requests.$wk_id.$cs_id`. - `REBASER_TASKS`: this is a classic Jetstream work queue stream, but where there is only *one* message per subject and where each message represents an exclusive task for a single Rebaser to spin up and run. In this case, there are only "process" tasks in which a Rebaser will consume the enqueued requests for a change set and run DVUs in serial (that is, only one at a time and one after the other as needed). When a request message is sent to the `REBASER_REQUESTS` stream, another body-less message is sent to this tasks stream. These messages are published on `rebaser.tasks.$wk_id.$cs_id.process`. Tasks Stream ------------ On the tasks stream, a single Jetstream consumer is setup to share work accross multiple Rebasers as clients. When a Rebaser starts to process a task message it continuously sends `AckKind::Progress` messages to keep the message from being ack'd or redelivered. If the task encounters an error, it can return a `Result::Err(_)` from its Naxum handler which will trigger an `AckKind::Nack(None)` message, causing an immediate redlivery of the message to another Rebaser. If a task needs to be interrupted due to a graceful shutdown of a Rebaser server, the handler it will return an `Error::Interupted(_)` error which ensures that the message is nack'd and will be redelivered. If a task can be cleanly completed, the the handler will return a `Result::Ok(_)` which triggers an `AckKind::Ack`, causing the message to be deleted from the tasks stream and so will *not* be ran again. Requests Stream --------------- When a "process" task is running, a dedicated NATS consumer is created to exclusively process all requests for a change set in serial (that is, one at a time). This consumer is known as an "ordered consumer" and is push-based (rather than the default pull-based consumers). An ordered consumer is much lighter weight and ephemeral as far as a NATS cluster is concerned and thus should reduce the stress on NATS when many change sets are created/active over a short period of time. This ordered consumer is set up with a timeout that detects when no message has entered the subject (or no message has been pulled into the Naxum app) within a period. When this "quiescence" period is seen, this triggers a specific graceful shutdown of the "process" task where its exit state is to return `Result::Ok(_)` and ack the task message. In this way, change sets which become inactive (that is, no Rebaser request message) are spun down to conserve resources and allow the Rebaser to focus only on "active" change sets. Using an ordered consumer means that we can no longer use a message `ack` to delete a processed messages (this is a trick of a work queue stream). Therefore when a request has been successfully processed, the message is deleted from the stream using its sequence number. A new Naxum middleware called `PostProcess` provides us a way to handle this delete on an `OnSuccess` callback. In the `OnFailure` case we simply don't delete the message which means the next message is still the first and only message to process. Another tracked Tokio task is running alongside this consuming requests task, called the `SerialDvuTask`. It is waiting for a `tokio::sync::Notify` to fire which is triggered by the request-consuming Tokio task. If, during the run of a DVU, another request is processed that requires another DVU run, then the `Notify` will be re-enabled. That way when the `SerialDvuTask` loop comes back to check, the `Notify` will be set thus trigger "yet one more" DVU run. Other Structural Changes ------------------------ Among other changes, some of note are: - The publisher of a Rebaser request send it directly to the `REBASER_REQUESTS` stream. Due to its current RPC (i.e. request-then-await-response) communication pattern, setting up the Rebaser in this way makes it operate more like Pinga and Veritech. Future work is likely going to bring back the concept of "activities" that flow for interested parties to follow, however prior to this commit, the only users of the activity stream were callers to the Rebaser and the Rebaser server itself. - A first class client for the Rebaser is provided in `lib/rebaser-client` which abstracts the NATS communication and the on-the-wire request formatting details, ideally allowing us to evolve the Rebaser's messaging internally where possible. - The start of an API messaging framework is provided largely in the `lib/rebaser-core` crate in the `api_types` module. Several Rust traits are provided that when implemented gives the user a versioned message that can be upgraded and understands its serialization format independently of versioning. This should allow us to evolve the Rebasers request and response messages with both the client and server able to detect unsupported message versions without having to deserialize the message payload itself. Big props to @jhelwig's work on version graph snapshots for the inspiration. - Rebaser request and response messages are transmitted with several metadata header in the NATS message including: - `X-CONTENT-TYPE`: describes the serialization format in a content type/MIME header compatible string. Current values are `application/json` and `application/cbor` with `application/cbor` the current default. - `X-MESSAGE-TYPE`: describes the message type, typically the primary Rust name. Current values are `EnqueueUpdatesRequest` and `EnqueueUpdatesResponse`. - `X-MESSAGE-VERSION`: a positive integer, where a greater number value implies a newer version. It is likely that code that would understand version 5 of a message may be able to understand version 4, 3, 2, 1 but not 6 as an example. - `NATS-MSG-ID`: this is a standard NATS header which can be used for message de-duplication and is populated with a random at-request-time `Ulid` value, managed via a `RequestId` type (required for all API data types). Signed-off-by: Fletcher Nichol <[email protected]>

fnichol · 2024-09-22T01:17:13Z

/try

nickgerace

Re-approved!

fnichol force-pushed the fnichol/rebaser-scale-moar branch from f2597f4 to fb3c461 Compare September 19, 2024 23:51

fnichol force-pushed the fnichol/rebaser-scale-moar branch from fb3c461 to 13c4ad5 Compare September 20, 2024 01:28

zacharyhamm force-pushed the fnichol/rebaser-scale-moar branch from c76f963 to a1db2f2 Compare September 20, 2024 20:05

fnichol force-pushed the fnichol/rebaser-scale-moar branch from 6a743a1 to f6828ee Compare September 21, 2024 06:34

nickgerace approved these changes Sep 21, 2024

View reviewed changes

fnichol force-pushed the fnichol/rebaser-scale-moar branch 2 times, most recently from 582e2a7 to 17dd07e Compare September 21, 2024 22:43

fnichol added 9 commits September 21, 2024 16:44

fix(naxum): allow AckLayer to be customized with handlers

e9e3839

This change resolves an issue where customizing the `OnSuccess` or `OnFailure` callbacks would result in a compiler error. Whoops! Signed-off-by: Fletcher Nichol <[email protected]>

feat(naxum): allow incoming streams that aren't Unpin

bbb6b25

This fix came up while trying to use `tokio-stream`'s `.timeout()` combinator which is not `Unpin`. Signed-off-by: Fletcher Nichol <[email protected]>

feat(naxum): add addition Response constructors with status codes

c54447e

Signed-off-by: Fletcher Nichol <[email protected]>

feat(dal): impl from si_events::ChangeSetId to dal::ChangeSetId

94e4fb6

Signed-off-by: Fletcher Nichol <[email protected]>

feat(dal): impl from si_events::WorkspacePk to dal::WorkspaceId

4d5fda7

Signed-off-by: Fletcher Nichol <[email protected]>

feat(dal): add build_for_change_set_as_system on DalContextBuilder

47f325f

Signed-off-by: Fletcher Nichol <[email protected]>

feat(naxum): expose the rejection macros for external use

6c5cd29

Signed-off-by: Fletcher Nichol <[email protected]>

chore(naxum): update default progress period to 20 seconds from 30

a7a66e5

Signed-off-by: Fletcher Nichol <[email protected]>

fnichol force-pushed the fnichol/rebaser-scale-moar branch 2 times, most recently from 011afed to 8c1cd8b Compare September 22, 2024 01:08

fnichol and others added 2 commits September 21, 2024 19:15

fix: wait on rebaser exec of dvu jobs in tests, increase pool size

2033a8c

fnichol force-pushed the fnichol/rebaser-scale-moar branch from 8c1cd8b to 2033a8c Compare September 22, 2024 01:15

fnichol marked this pull request as ready for review September 22, 2024 01:51

nickgerace approved these changes Sep 22, 2024

View reviewed changes

fnichol added this pull request to the merge queue Sep 22, 2024

Merged via the queue into main with commit 5bb689d Sep 22, 2024
40 checks passed

fnichol deleted the fnichol/rebaser-scale-moar branch September 22, 2024 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rearchitect rebaser to support more scale #4638

feat: rearchitect rebaser to support more scale #4638

fnichol commented Sep 19, 2024 •

edited

Loading

fnichol commented Sep 19, 2024

fnichol commented Sep 21, 2024

nickgerace left a comment

nickgerace Sep 21, 2024

nickgerace Sep 21, 2024

fnichol Sep 21, 2024

nickgerace Sep 21, 2024

nickgerace Sep 21, 2024

nickgerace Sep 21, 2024

fnichol Sep 21, 2024

nickgerace Sep 21, 2024

fnichol commented Sep 22, 2024

fnichol commented Sep 22, 2024

nickgerace left a comment

feat: rearchitect rebaser to support more scale #4638

feat: rearchitect rebaser to support more scale #4638

Conversation

fnichol commented Sep 19, 2024 • edited Loading

General NATS Jetstream Architecture

Tasks Stream

Requests Stream

Other Structural Changes

fnichol commented Sep 19, 2024

fnichol commented Sep 21, 2024

nickgerace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fnichol commented Sep 22, 2024

fnichol commented Sep 22, 2024

nickgerace left a comment

Choose a reason for hiding this comment

fnichol commented Sep 19, 2024 •

edited

Loading