Page MenuHomePhabricator

Set up Grafana dashboard for monitoring Charts
Closed, ResolvedPublic3 Estimated Story Points

Description

Follow up to T372081

  • make sure we have appropriate logging in place (@CDanis to do)
  • Set up Grafana dashboard for monitoring, starting with making a copy of the template (Charts to do)

We can get latency and also request success rates from different queries based on the one existing Prom metric.

  • Add alerts around latency ("under 5 seconds" perhaps), and one around 5xx result ratio

Event Timeline

CCiufo-WMF triaged this task as Medium priority.
CCiufo-WMF raised the priority of this task from Medium to High.
CCiufo-WMF updated the task description. (Show Details)
CCiufo-WMF added a subscriber: CDanis.
CCiufo-WMF added a subscriber: aude.
CCiufo-WMF set the point value for this task to 3.Nov 18 2024, 7:48 PM
CCiufo-WMF moved this task from Up Next to Sprint 11 on the Charts board.
CCiufo-WMF edited projects, added Charts (Sprint 11); removed Charts.
Seddon renamed this task from Test wiki post-deployment work to Set up Grafana dashboard for monitoring Charts.Nov 18 2024, 7:57 PM
CCiufo-WMF edited projects, added Charts (Sprint 13); removed Charts (Sprint 11).
CCiufo-WMF moved this task from Incoming to Ready for Dev on the Charts (Sprint 13) board.
CCiufo-WMF moved this task from Ready for Dev to Doing on the Charts (Sprint 13) board.

Change #1100202 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] chart-renderer: scrape metrics

https://gerrit.wikimedia.org/r/1100202

Change #1100202 merged by jenkins-bot:

[operations/deployment-charts@master] chart-renderer: scrape metrics

https://gerrit.wikimedia.org/r/1100202

Change #1100516 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] chart-renderer: use the metrics port

https://gerrit.wikimedia.org/r/1100516

Change #1100516 merged by jenkins-bot:

[operations/deployment-charts@master] chart-renderer: use the metrics port

https://gerrit.wikimedia.org/r/1100516

Basic dashboard with the most important high-level metrics: https://grafana.wikimedia.org/d/f10cba3c-086c-49b2-bb04-65d70b39969a/charts-high-level

Also fixed up the things that were blatantly broken on the microservice dashboard (although I'm still not super happy with the overall template/layout there)