Page MenuHomePhabricator

Investigate increase in CD termination state after upgrading eqsin/ulsfo to HAProxy 2.8.10
Closed, ResolvedPublic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1047483 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] hiera: test downgrading haproxy on cp5017

https://gerrit.wikimedia.org/r/1047483

Change #1047483 merged by Fabfur:

[operations/puppet@production] hiera: test downgrading haproxy on cp5017

https://gerrit.wikimedia.org/r/1047483

Change #1047492 had a related patch set uploaded (by Fabfur; author: Fabfur):

[operations/puppet@production] hiera: test upgrading cp5017 to haproxy 2.8.9

https://gerrit.wikimedia.org/r/1047492

Change #1047492 merged by Fabfur:

[operations/puppet@production] hiera: test upgrading cp5017 to haproxy 2.8.9

https://gerrit.wikimedia.org/r/1047492

this is caused a bug in the mtail regex used to parse haproxy logs, on haproxy 2.6.17 http status gets reported as -1:

2024-06-24T13:46:52.422444 00:00 cp7001 haproxy[2780368]: 180684 -1 0 0 -1 {es.wikipedia.org} {} CD

but the regex expects a 3 figures number from 100 to 599:

(?P<http_status_family>[1-5])\d\d

Change #1049183 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] mtail: Allow http status -1 on cache_haproxy

https://gerrit.wikimedia.org/r/1049183

Change #1049183 merged by Vgutierrez:

[operations/puppet@production] mtail: Allow http status -1 on cache_haproxy

https://gerrit.wikimedia.org/r/1049183

Vgutierrez claimed this task.

cache_haproxy.mtail failed to accept -1 as an HTTP status code, under reporting CD and CR termination states.

Mentioned in SAL (#wikimedia-operations) [2024-06-24T15:43:51Z] <vgutierrez> updated termination_state cache haproxy metrics, expect higher CD and CR rates - T367963