Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve alerts #18080

Merged
merged 16 commits into from
Jul 17, 2024
Merged

Improve alerts #18080

merged 16 commits into from
Jul 17, 2024

Conversation

stelfrag
Copy link
Collaborator

@stelfrag stelfrag commented Jul 8, 2024

Summary

Improve the agent to cloud alert communication. The goal is to reduce messages transmitted during agent start or reconnect.

Agents calculate and send a version with each alert. When agent -- cloud link is established the start alerts streaming command carries the version of the alerts the cloud is aware.

A version match on the agent side ensures that we don't transmit transitions that the cloud already knows

New tables

  • alert_queue
    Holds alerts transitions to be processed. Also used to de-duplicate alert transitions
  • aclk_queue
    Holds pending alerts transitions to be send to the cloud
  • alert_version
    Holds last alert version after being submitted to the cloud
  • aclk_alert_xxxx tables have been removed

netdatacli aclk-state adjusted to reflect changes
Removed fields Pending Min Seq ID, Pending Max Seq ID and Last Submitted Seq ID
Added

  • Checkpoints : number of checkpoint requests received from the cloud
  • Alert count : number of alert transitions sent to the cloud
  • Alert snapshot count : number of snapshots sent to the cloud due to checkpoint failures
Test Plan
  • Start agent with new PR. Check alerts

Cleanup table aliases
Add DEFINEs for alert delays
Add missing newline
thiagoftsm
thiagoftsm previously approved these changes Jul 16, 2024
Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changes, the PR worked as expected on my environment. I had alerts from both parent and child.

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the branch after last commits, and I am having my alerts on cloud as expected, LGTM!

@stelfrag stelfrag merged commit 3dfe351 into netdata:master Jul 17, 2024
140 checks passed
@stelfrag stelfrag deleted the improve_alerts branch July 19, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants