Page MenuHomePhabricator

Unknown error with #tag:chem
Closed, ResolvedPublic

Description

Steps to replicate the issue (include links if applicable):

What happens?: the example formulae aren't rendered, instead there are red error texts, in English: 'math failure' / 'failed to parse' and between parenthesis: 'unknown error'.

Event Timeline

{{Bètaverval|109|52|1|p}}

generates

"Fout bij het ontleden (onbekende fout) {\displaystyle \ce{{^{ 109 }_{52}Te } -> {^{ 108 }_{ 50 } Sn } {p^ } {e^ } {\nu}_e}}

I think "fout by het ontleden" is a translation of "Error while parsing"

Special:ExpandTemplates generates

<chem>{^{ 109 }_{52}Te } 
-> 
{^{ 108 }_{ 50 }
Sn }   {p^ }  {e^ }    {\nu}_e</chem>
Physikerwelt subscribed.

Thank you. This is entirely orthogonal to T348973, and I can analyze the problem.

@Physikerwelt Because I was in logstash anyway, here is a screen grab of k8s mathoid notices of this morning. In case it is helpful.

mathoid errors (767×600 px, 123 KB)

The SVG Unknown character one, is very prevalent. It's like 90% of the notices that the mathoid service throws.

The problem does only occur in production, not in beta. Also mathoid renders it fine, as tested on https://mathoid.beta.math.wmflabs.org/info.html.

Now, the interesting question is, what is the exact request sent to restbase? I could get an image back from https://en.wikipedia.org/api/rest_v1/media/math/render/png/64a81affdfa7d55ea0113d20ee276e4a4857ee98 but maybe I missed a space or so when trying to reverse engineer the request.

curl -d q="{\\displaystyle {\\ce {{^{109}_{52}Te}->{^{108}_{50}Sn} {p^{ }} {e^{ }} {\\nu }_{e}}}}" https://en.wikipedia.org/api/rest_v1/media/math/check/chem

generates the image linked above.

I'm checking the logs, and based on my usage of Special:ExpandTemplates, I'm wondering if this is related to T349347.

I get T349347 entries for each of these requests on that page.

I found the error. The restbase response is

{"success":true,"checked":"{\\displaystyle {\\ce {c{a^{b}}}}}","requiredPackages":["mhchem"],"identifiers":["c","a","b"],"endsWithDot":false,"warnings":[{"type":"mhchem-deprecation","details":{"success":false,"warnings":[],"status":"S","details":"SyntaxError: Expected \"}\", [\\-0-9a-zA-Z *,=():/;?.!'` [\\]\\x80-퟿-￿], or [\ud800-\udbff] but \"^\" found.","offset":23,"line":1,"column":24,"error":{"message":"Expected \"}\", [\\-0-9a-zA-Z *,=():/;?.!'` [\\]\\x80-퟿-￿], or [\ud800-\udbff] but \"^\" found.","expected":[{"type":"class","parts":["-",["0","9"],["a","z"],["A","Z"]," ","*",",","=","(",")",":","/",";","?",".","!","'","`"," ","[","]",["€","퟿"],["","￿"]],"inverted":false,"ignoreCase":false},{"type":"class","parts":[["\ud800","\udbff"]],"inverted":false,"ignoreCase":false},{"type":"literal","text":"}","ignoreCase":false}],"found":"^","location":{"end":{"offset":24,"line":1,"column":25},"start":{"offset":23,"line":1,"column":24}},"name":"SyntaxError"}}}]}

calling the function json_decode on that string returns null.

PHP would throw `Fatal error: Uncaught JsonException: Single unpaired UTF-16 surrogate in unicode escape in ...

The definition of UTF-8 prohibits encoding character numbers
between U D800 and U DFFF, which are reserved for use with the 
UTF-16 encoding form (as surrogate pairs) and do not directly
represent characters.

So because of [\ud800-\udbff], the string decoder expects that these chars are uff-16 encoded utf-8, but there is no second byte character, and it dies. This is apparently a known difference between node/js and php, if you start googling for it.

If I remove all mentions of [\ud800-\udbff] from the json response, then json_decode handles the decoding. (there is still a syntax error in the input of course)

I was looking for an easy fix from within PHP to replace those chars. If we changed it in mathoid it would still be broken for the existing elements, as the cache does not expire. Maybe preg_replace ...

Change 970381 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/extensions/Math@master] Remove UTF-16 chars from mathoid warnings

https://gerrit.wikimedia.org/r/970381

Change 970437 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/services/texvcjs@master] Release mathoid-texvcjs 0.5.4

https://gerrit.wikimedia.org/r/970437

Change 970381 abandoned by Physikerwelt:

[mediawiki/extensions/Math@master] Remove UTF-16 chars from mathoid warnings

Reason:

cf. Iba74d000bca808223a44b7344c6740fc15062d8e

https://gerrit.wikimedia.org/r/970381

Change 970437 merged by Physikerwelt:

[mediawiki/services/texvcjs@master] Release mathoid-texvcjs 0.5.4

https://gerrit.wikimedia.org/r/970437

Change 971400 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[operations/deployment-charts@master] mathoid: update version

https://gerrit.wikimedia.org/r/971400

So, the fix landed in mathoid; now the new version needs to be deployed, and the error should vanish.

Change 971400 merged by jenkins-bot:

[operations/deployment-charts@master] mathoid: update version

https://gerrit.wikimedia.org/r/971400

So, the fix landed in mathoid; now the new version needs to be deployed, and the error should vanish.

Deployed. I still see the error; is it cached?

I suspect this peak is mediawiki hitting cached older invalid responses.

body: json response error
channel: Math
code: 503
message: Received invalid response from restbase.

https://logstash.wikimedia.org/goto/b344761a525905e30b29f401a9d6637c

Screenshot 2023-11-07 at 15.49.00.png (567×1 px, 76 KB)

Change #1047201 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] mathoid: Upgrade image from 2023-11-03-103402 to 2024-06-18-233457

https://gerrit.wikimedia.org/r/1047201