Page MenuHomePhabricator

generate MathML (not PNG) and automatically embed hyperlinks for each symbol
Closed, ResolvedPublic

Description

Author: richardbrucebaxter

Description:
I propose that mediawiki is upgraded to generate MathML and automatically embed hyperlinks for each mathematical symbol/operator. Note it was suggested I open this bugzilla report by a developer on the mediawiki IRC dev channel after showing them my initial request on the meta wiki (http://meta.wikimedia.org/w/index.php?title=Help_talk:Displaying_a_formula#Mediawiki_math_markup_interpretation_upgrade:_generate_MathML_.28not_PNG.29_and_automatically_embed_hyperlinks_for_each_symbol).

The following is an example of proposed wiki math code
<math>A[[simple equation#area]] = \pi r[[simple equation#radius]]^2 q[[some constant]] \cos(z)</math>

  1. Latex math [[wiki link]] tags are removed by the Math extension preprocessor
  2. MathML is generated by latexml
  3. user defined hyperlinks are readded to the generated MathML by the Math extension postprocessor
  4. hyperlinks are automatically added to all remaining mathematical operators/symbols by the Math extension postprocessor (which reside in its database of existing math symbols/operators; e.g. plain text file)

Relevant wiki hyperlinks are automatically generated for all standard mathematical symbols and operators eg " ", "=", "squared", "cos". If the user for example clicks on the hyperlink to z (which has not been explicitly defined by the user and does not reside in the Math extension database of mathematical symbols), the wiki (e.g. Wikipedia) returns "Variable is undefined, would you like to define it by editing this page?"

Note Firefox's MathML implementation (I am unsure about MathJax) requires either all math objects to be explicitly hyperlinked or none. The reason for this is that when hyperlinks are auto generated for math "sections" (eg square root, division), all the child objects in the section are by default linked to the section hyperlink (e.g. "b^2-4ac" in "\sqrt{b^2-4ac\ }"). This will be confusing for the user, so it is better that all math objects be explicitly hyperlinked, even if they must be directed to a new/edit page.

Although I am not a web programmer, I am happy to implement this myself in php if necessary. I must however report an issue in the existing mediawiki Math extension software that affects the latexml MathML option (it appears to be related to https://gerrit.wikimedia.org/r/#/c/135521). The only way I have been able to get the latexml MathML option working in the current version of the mediawiki Math extension (e.g. 11 August 2014) or the last stable version of the mediawiki Math extension (1.23.2) is to;

  1. first delete the mysql wiki database (drop database <db_name>;)
  2. then install mediawiki 1.22.9 (legacy) along with its corresponding version of the mediawiki Math extension (1.22.9).

a. ensure to tick "enable image uploads" (to prevent a bug that stops the default PNG generated formulae from being displayed)
b. create Math extension temporary folders
cd /var/www/html/mediawiki-xxx/images
mkdir math
mkdir tmp
sudo chown -R www-data:www-data *
c. run maintenance/update.php (to prevent a bug "A database query error has occurred. This may indicate a bug in the software")
d. install LaTeXML (http://www.formulasearchengine.com/node/3)
e. test that MathML is working; set $wgUseLaTeXML = true; $wgUseMathJax = true; $wgDefaultUserOptions['math'] = MW_MATH_LATEXML; in LocalSettings.php

  1. then install the current version of mediawiki (mediawiki-latest.tar.gz/11 August 2014 or mediawiki 1.23.2) along with its corresponding version of the mediawiki Math extension (Math.zip/11 August 2014 or 1.23.2).

I have attached a complete installation log for reference (mediaWikiMathExtensionMathMLinstallationLog-11August2014.txt). It would however be useful if someone could publish a formal workaround for this issue (for at least 1.23.2 stable); for example a .sql file containing the required mysql table updates.

I have also attached the mathML code of what I expect a final equation to look like on Wikipedia after the latex is preprocessed, rendered, and postprocessed (mathMLtestQuadraticEquation.html).

Thanks for your help.

Richard


mathMLtestQuadraticEquation.html

<!-- 1. original latex code -->
<!-- <math>x=\frac{-b\pm\sqrt{b^2-4ac\ }}{2a}.</math> -->

<!-- 2. proposed wikpedia latex code -->
<!-- <math>x=\frac{-b[[quadratic equation#linear coefficient]]\pm\sqrt{b^2-4a[[quadratic equation#quadratic coefficient]]c[[quadratic equation#constant]]\ }}{2a}.</math> -->

<!-- 3. proposed final wikpedia MathML output -->
<math>
<mrow>
<mi href="en.wikipedia.org/wiki/quadratic_equation#quadratic_root">x</mi>
<mo href="en.wikipedia.org/wiki/Equals_sign">=</mo>
<mfrac href="en.wikipedia.org/wiki/fraction">
<mrow>
<mo>&#x2212;</mo>
<mi href="b" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/quadratic_equation#linear_coefficient">b</mi>
<mo href="±" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/Plus-minus_sign">&#xB1;</mo>
<msqrt href="" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/Square_root">
<mrow>
<msup>
<mi href="b" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/quadratic_equation#linear_coefficient">b</mi>
<mn href="2" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/Square_number">2</mn>
</msup>
<mo href="−" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/Subtraction">&#x2212;</mo>
<mn href="4" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/number">4</mn>
<mi href="a" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/quadratic_equation#quadratic_coefficient">a</mi>
<mi href="c" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/quadratic_equation#constant">c</mi>
</mrow>
</msqrt>
</mrow>
<mrow>
<mn href="2" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/number">2</mn>
<mi href="a" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/quadratic_equation#quadratic_coefficient">a</mi>
</mrow>
</mfrac>
</mrow>
</math>


Version: master
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:40 AM
bzimport added a project: Math.
bzimport set Reference to bz69424.

richardbrucebaxter wrote:

mediaWikiMathExtension MathMLinstallationLog-11August2014.txt

attachment mediaWikiMathExtensionMathMLinstallationLog-11August2014.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLtestQuadraticEquation.html

Attached:

richardbrucebaxter wrote:

mediaWikiMathExtensionMathMLinstallationLog-12August2014.txt

CORRECTION: I can't confirm that the latexml MathML option of the current mediawiki version 1.23.2/1.24alpha (with its corresponding Math extension) can be made to work by first preinstalling mediawiki 1.22.9 (and its corresponding Math extension). It appears that it appeared to work due to some kind of mysql table cache of previously generated formulae. Mediawiki 1.22.9 and its corresponding Math extension is the only version I have been able to get working with the latexml MathML option.

attachment mediaWikiMathExtensionMathMLinstallationLog-12August2014.txt ignored as obsolete

physik wrote:

Hi Richard,

I'm extremely interested in your project. Hyperlinks in the formulae are exactly what I want to have a well defined semantics for mathematical formulae. I strongly encourage you to keep working on this approach and I'll support you if you need help with that.
I feel sorry that you had difficulties with the mathlatexml table. It has been renamed from math_latexml to mathlatexml, which requires another run of the database update.
The important step during the installation is
"After enabling the LaTeXML rendering mode you have to run the database update script again to create the required table."
Did you do that?
http://www.mediawiki.org/wiki/Manual:Update.php

The mathoid table which you were referring to is not used in the LaTeXML rendering mode.
If you login to mysql you should now see a mathlatexml table after executing "Show tables;"

Best
Physikerwelt

physik wrote:

.. I updated the documentation about that table
https://www.mediawiki.org/wiki/Extension:Math/mathlatexml_table
It's save to run
drop table mathlatexml;
and run the database update script again.
The table is used as cache only to increase the performance of the math extension.
In addition I'd like to give you a pointer to a place where the default meanings of the macros can be added.
In latexml the file [1]

takes care of mediawiki specific commands. Not all of them [2] are listed there.
For example there is a command called $\Reals$ (unfortunately only 8 times used in enwiki) but imho a prominent candadiate for a link to https://en.wikipedia.org/wiki/Real_number or even better to https://www.wikidata.org/wiki/Q12916 to be language independent.

[1] https://github.com/brucemiller/LaTeXML/blob/master/lib/LaTeXML/Package/texvc.sty.ltxml
[2] http://www.formulasearchengine.com/sites/formulasearchengine.com/files/android.txt

richardbrucebaxter wrote:

mathMLelementsWikipediaLinks.txt

attachment mathMLelementsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

attachment mathMLsymbolsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

Thanks for your help Physikerwelt,

Although Math extension 11 August 2014 was giving a blank screen, version 13 August 2014 was giving a "Failed to parse (<math_empty_tex>):" error. I got it working by compiling texvccheck (or setting $wgMathDisableTexFilter = true;). These are basic installation instructions that I had overlooked (provided by https://www.mediawiki.org/wiki/Extension:Math).

Both MW_MATH_MATHML and MW_MATH_LATEXML are now working (with both mediawiki stable 1.23.2 and mediawiki latest 13 August 2014, using a current Math extension build).

$wgMathValidModes = array( MW_MATH_PNG, MW_MATH_SOURCE, MW_MATH_LATEXML, MW_MATH_MATHML);
$wgDefaultUserOptions['math'] = MW_MATH_LATEXML;

As you have mentioned, setting MW_MATH_LATEXML requires reexecution of "php update.php" to prevent a "A database query error has occurred" error (although setting MW_MATH_MATHML does not).

Richard

richardbrucebaxter wrote:

mediaWikiMathExtensionMathMLinstallationLog-13August2014.txt

attachment mediaWikiMathExtensionMathMLinstallationLog-13August2014.txt ignored as obsolete

richardbrucebaxter wrote:

Thanks also for your interest in this project (and the links).

Note I had an additional idea for the postprocessor to help reduce the amount of manual wikipedia editing consequent of the proposed enhancement;

  1. detect all possible variable names within the generated mathml tags
  2. siphon variable descriptions from wiki text immediately following <math> text based on the variable names detected
  3. create mathml tooltips for all of these variables (displaying their extracted descriptions)

Richard

physik wrote:

Hi Richard,

Robert Pagel and me started with that already. (http://arxiv.org/abs/1407.0167) It's all open source.

Physikerwelt

richardbrucebaxter wrote:

CORRECTION: "Although Math extension REL1.23.2 was giving a blank screen.."

richardbrucebaxter wrote:

Cheers Physikerwelt - that is exactly what I was thinking of.

richardbrucebaxter wrote:

mediaWikiMathExtensionMathMLinstallationLog-14August2014.txt

Attached:

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

attachment mathMLsymbolsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLsymbolsWikipediaLinks.txt

attachment mathMLsymbolsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

mathMLelementsWikipediaLinks.txt

attachment mathMLelementsWikipediaLinks.txt ignored as obsolete

richardbrucebaxter wrote:

latexSymbolsGreek.txt

Attached:

richardbrucebaxter wrote:

AEHprelim-MathMathML.patch

This patches MathMathML.php (13 August 2014) for use with MathMathMLautomaticallyEmbedHyperlinks.php (preliminary version).

attachment AEHprelim-MathMathML.patch ignored as obsolete

richardbrucebaxter wrote:

MathMathMLautomaticallyEmbedHyperlinks.php

MathMathMLautomaticallyEmbedHyperlinks.php (preliminary version). Here is an example of mediawiki input/output;

Latex;
<math>\gamma[[description]]=\frac{-b\pm\sqrt{b^2-4ac\ }}{2a}. 355 \alpha[[Fine structure]] \beta[[hello]] c ab[[b variable]]</math>

MathML;
<math xmlns="http://www.w3.org/1998/Math/MathML" href="" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mrow class="MJX-TeXAtom-ORD" href="" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mstyle href="" class="remarkup-link" target="_blank" rel="noreferrer">https://en.wikipedia.org/wiki/AEHtextFileCellEmpty">
<mi href="γ