Page MenuHomePhabricator

Web Worker not working properly for Chrome in CI Docker/OpenStack (Unable to test navigationTiming#emitCpuBenchmark)
Closed, ResolvedPublic

Description

Today I upgraded the hhvm variant of Quibble docker image (used by MW Jenkins jobs) from Debian Jessie to Debian Stretch (dbf2347bbc049b1, d48921fb8f30e).

The job for MW plain and individual extensions was passing still, but the job for shared-gate extensions (which includes Navigation Timing, and also runs on MW core commits) started failing, like so:

    ✔ Load topics
  ext.flow.dm mw.flow.dm.Board
    ✔ Create board
  Thanks thank
    ✔ thanked cookie
    ✔ gets user gender
  ext.navigationTiming
    ✔ Basic
    ✔ First view
    ✔ Repeat view
    ✔ Reloaded view
    ✔ Without Navigation Timing API
    ✔ Oversample config and activation
    ✔ emitOversampleNavigationTiming tests
    ✔ onMwLoadEnd - plain
    ✔ onMwLoadEnd - controlled
    ✔ Oversample Geo integration tests
    ✔ Optional APIs
    ✔ makeResourceTimingEvent
    ✔ emitTopImageResourceTiming
    ✔ emitCentralNoticeTiming
10 02 2019 04:05:24.902:WARN [HeadlessChrome 0.0.0 (Linux 0.0.0)]: Disconnected (1 times), because no message in 60000 ms.
HeadlessChrome 0.0.0 (Linux 0.0.0) ERROR
  Disconnected, because no message in 60000 ms.

Finished in 1 min 0.319 secs / 0.292 secs @ 04:05:24 GMT 0000 (UTC)

10 02 2019 04:05:24.905:DEBUG [karma]: Run complete, exiting.
10 02 2019 04:05:24.906:DEBUG [launcher]: Disconnecting all browsers
10 02 2019 04:05:24.926:DEBUG [launcher]: Process Chrome exited with code 0
10 02 2019 04:05:24.927:DEBUG [temp-dir]: Cleaning temp dir /tmp/karma-74277250
10 02 2019 04:05:24.933:DEBUG [launcher]: Finished all browsers
Warning: Task "karma:main" failed. Use --force to continue.

Aborted due to warnings.
FAILED TESTS:
  ext.navigationTiming
    ✖ emitCpuBenchmark
      HeadlessChrome 0.0.0 (Linux 0.0.0)
    Test took longer than 60000ms; test timed out.

It basically just times out mid-way the "ext.navigationTiming" test suite, at the some point consistently so. By looking at the console output of an older build of that same Jenkins job I found that the test that normally runs after this one is emitCpuBenchmark.

Disabling that made the build pass again.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 489438 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] tests: Disable emitCpuBenchmark test

https://gerrit.wikimedia.org/r/489438

Part of this change was, implicitly, an upgrade of Chromium from version 57 (packages/debian/jessie) to version 71 (packages/debian/stretch).

My guess is that the number of iterations we perform in the CPU benchmark might takes longer than 60 seconds now (I don't know how close it was before the upgrade).

QUnit recently introduced an option to set a timeout for a specific test separate from the default (https://api.qunitjs.com/assert/timeout) so we may want to play with that, or to reduce the number of iterations in the test or generally.

(Marking the cross-repo build failure as resolved, given it's now disabled.)

Change 489438 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@master] tests: Disable emitCpuBenchmark test

https://gerrit.wikimedia.org/r/489438

Change 489591 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@wmf/1.33.0-wmf.16] tests: Disable emitCpuBenchmark test

https://gerrit.wikimedia.org/r/489591

Change 489591 merged by jenkins-bot:
[mediawiki/extensions/NavigationTiming@wmf/1.33.0-wmf.16] tests: Disable emitCpuBenchmark test

https://gerrit.wikimedia.org/r/489591

Krinkle triaged this task as Medium priority.Feb 11 2019, 9:35 PM

Change 493127 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/extensions/NavigationTiming@master] Re-enable integration test for cpuBenchmark

https://gerrit.wikimedia.org/r/493127

Change 494252 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] [WIP] Change karma runner from iframe to new tab

https://gerrit.wikimedia.org/r/494252

Krinkle removed Krinkle as the assignee of this task.EditedMar 4 2019, 4:20 PM

Moving back to the pile for later. Not as easy as I initially thought.

I cannot reproduce the failure locally with Headless Chrome (on macOS). This means it's probably something to do with:

  • Debian 9,
  • Chromium vs Chrome,
  • Docker,
  • OpenStack,
  • Environment variables and or limitations set by Jenkins, or Quibble.

The Worker class exists, can be instantiated and have postMessage() called without exceptions. But then as soon as it meant to start doing something, worker.onerror handler fires and provides an ErrorEvent object. One with no message property and no other way to see what happened.

A next step could be to try and reproduce it with Quibble locally. And, if that fails locally the same way, to start tearing away stuff left and right until a minimal test case remains.

Change 493127 abandoned by Krinkle:
Re-enable integration test for cpuBenchmark

https://gerrit.wikimedia.org/r/493127

Change 494252 abandoned by Krinkle:
[WIP] Change karma runner from iframe to new tab

https://gerrit.wikimedia.org/r/494252

Rather than try to fix it, let's instead reduce it and report upstream.

Krinkle renamed this task from Failing test 'ext.navigationTiming#emitCpuBenchmark' to Web Worker not working properly for Chrome in CI Docker/OpenStack (Unable to test navigationTiming#emitCpuBenchmark).Jun 6 2019, 10:34 AM
Krinkle added a project: Upstream.
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM
Krinkle changed the subtype of this task from "Production Error" to "Task".

We might want to just ocasionally try again in a newer Chrome version and newer OpenStack/Docker/Debian images from CI. But on the whole there doesn't seem to be much we can do. The worker just doesnt.. work in the current CI infra.

We'll keep manually testing this for now when we change it. Fortunately, that's not very often.

Part of this change [from jessie-hhvm to stretch-hvm] was, implicitly, an upgrade of Chromium from version 57 (packages/debian/jessie) to version 71 (packages/debian/stretch).

Well, in the months since we're still opn Debian Stretch, but now with PHP 7.2 and more importantly Chromium is now at version 73, and it seems the CPU bunchmark integration test now works again.

I guess whatever bug Chromium introduced with Web Workers between v57 and v71 got fixed in v72 or v73.

Ref https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/NavigationTiming/ /606452/

Krinkle closed this task as Resolved.
Krinkle claimed this task.