Page MenuHomePhabricator

Link.langlinkUnsafe does not work on Beta-Cluster wikis
Open, MediumPublic

Description

The language links of the Beta-Cluster are to the production wikis (see T69931) which is incompatible with Link.langlinkUnsafe as that requires to be in the same family.

Now for example https://gerrit.wikimedia.org/r/174827 tries to fix certain issues with our QueryGenerator and there I remove the expectedFailure from tests.site_tests.TestPagePreloading.test_preload_langlinks_normal and now get an error that [[https://travis-ci.org/xZise/pywikibot-core/jobs/79537609#L926|bar is not in the family wpbeta]].

Now I'm wondering whether we should deprecate that method and only use APISite.interwiki to actually get the site related to the given language prefix. Alternatively we could avoid that in our tests using something similar to deb64e80 but it would only fix the patch and not really this problem.

Event Timeline

XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise added a project: Pywikibot.
XZise subscribed.

Using APISite.interwiki doesnt really fix the problem either. It will allow a site to be created, but it will be a production wiki in a different family, which means other assumptions will be broken. And it can not only use APISite.interwiki, as something needs to handle obsolete language codes in wikitext (i.e. non-Wikibase wikis), and that is currently done using the Family.

One quick fix is to add a beta family to the repo, with custom code to work around T69931. With the features already merged, such as from_url, we could probably avoid a hard-coded language list. (some of the work for AutoSubdomainFamily might help: https://gerrit.wikimedia.org/r/#/c/138312)

https://gerrit.wikimedia.org/r/#/c/236404/ opens up a lot of possibilities, as we could create a custom (betawiki specific) APISite subclass with appropriate behaviour for 'unknown' language codes in order to work around bugs in a way that only affects betawiki.

Another option is to add functionality to the family generation to workaround T69931. It could be as simple as a sed script on the generated family, replacing *.wikipedia.org with the beta equivalent.

Just one note: There is not always a beta equivalent (e.g. there is no http://bar.wikipedia.beta.wmflabs.org/ ). So also suggesting to add a beta family to the repo would mean that languages like bar actually map to the production wikis (and from_url must except them).

Just one note: There is not always a beta equivalent (e.g. there is no http://bar.wikipedia.beta.wmflabs.org/ ). So also suggesting to add a beta family to the repo would mean that languages like bar actually map to the production wikis (and from_url must except them).

That is why we may need a custom APISite as well as a custom family. Such an ugly problem to work around. It would be worth finding other wikis with similar interwikimap bugs, to justify the time spent on it. The one sort of similar problem is battlestarwiki, which has a lot of dead wikis in its interwikimap. I expect that we'd find other wikis with interwikimap problems, but I am not sure how we'd find these wikis. We could validate the interwiki map of all meta IWM wikis; that might uncover some interesting problems.

If it is a beta wiki only problem, we should work with 'upstream' to resolve T75906, which has a patch and probably only needs to get some review loving.

Okay as no interwiki entry seems to be to *.beta.wmflabs, maybe change interwiki_forward to 'wikipedia'? Seems kind of hackish and goes against T89451.

Strange. The interwiki map still points to prod wikis, but the sitematrix is OK.

SiteMatrix is separate. Isn't all of this code wonderful? :)
But yes, the interwiki dump script change for beta got approved and now I need to run it.

FYI: Updated interwiki cache in beta.

Hmm this unfortunately using unavailable wikis like http://bar.wikipedia.beta.wmflabs.org/wiki/$1. To be honest could you remove languages which aren't supported… As long as they don't pop up as language links (aka query langlinks won't list them and query siteinfo (interwikimap) doesn't supply a language) it should work.

I guess we could split langlist and use a separate one on beta.

thcipriani triaged this task as Medium priority.Sep 14 2015, 7:25 PM
thcipriani moved this task from To Triage to In-progress on the Beta-Cluster-Infrastructure board.

Change 239748 had a related patch set uploaded (by Alex Monk):
Split langlist for beta

https://gerrit.wikimedia.org/r/239748

Change 239748 merged by jenkins-bot:
Split langlist for beta

https://gerrit.wikimedia.org/r/239748

How long does it take until that comes live so that we can verify that it worked? At the moment the English Wikipedia still advertises bar as a langlink to a non-existing wiki: http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap

Change 240378 had a related patch set uploaded (by Alex Monk):
Use /srv/mediawiki directly instead of $IP/../

https://gerrit.wikimedia.org/r/240378

Change 240391 had a related patch set uploaded (by Alex Monk):
Use getRealmSpecificFilename to get langlist path

https://gerrit.wikimedia.org/r/240391

Change 240378 abandoned by Chad:
Use /srv/mediawiki directly instead of $IP/../

Reason:
Stupid

https://gerrit.wikimedia.org/r/240378

Change 240391 merged by jenkins-bot:
Use getRealmSpecificFilename to get langlist path

https://gerrit.wikimedia.org/r/240391

Change 240612 had a related patch set uploaded (by Alex Monk):
MWRealm::getRealmSpecificFilename: Fix support for filenames without an extension but with full stops in the full path

https://gerrit.wikimedia.org/r/240612

Change 240612 merged by jenkins-bot:
Multiversion MWRealm getRealmSpecificFilename: Fix support for filenames without an extension but with full stops in the full path

https://gerrit.wikimedia.org/r/240612

Fixed for reals this time.

Following the link I posted above I don't get “bar” anymore (good) but I still get invalid wikis. So I just written a short script:

  • http://aa.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://als.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://ar.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://bat-smg.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://be-x-old.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://ca.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://de.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://eml.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://en.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://eo.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://es.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://fa.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://fiu-vro.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://he.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://hi.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ja.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ko.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://no.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://roa-rup.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://ru.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://simple.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://sq.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://uk.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://zh-classical.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://zh-min-nan.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://zh-yue.wikipedia.beta.wmflabs.org/: HTTP/1.1 404 Not Found
  • http://zh.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently

Afaik all 404 shouldn't be there. And in case you are interested in the script:

for url in `curl "http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap&format=json" | ./jq '.query.interwikimap | map(select(has("language"))) | [.[].url[:-7]] | unique | .[]' --raw-output`; do
  echo "* \`$url\`: " `curl -I "$url" | head -1` ;
done > out

Underlying bug is fixed. Interwiki map probably needs regeneration?

krenair@deployment-bastion:~$ sudo -u jenkins-deploy updateinterwikicache
Updating interwiki cache...
           ___ ____
         ⎛   ⎛ ,----
          \  //==--'
     _//|,.·//==--'    ____________________________
    _OO≣=-  ︶ ᴹw ⎞_§ ______  ___\ ___\ ,\__ \/ __ \
   (∞)_, )  (     |  ______/__  \/ /__ / /_/ / /_/ /
     ¨--¨|| |- (  / ______\____/ \___/ \__^_/  .__/
         ««_/  «_/ jgs/bd808                /_/

20:42:52 Started sync-proxies
20:42:52 Job ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/interwiki.cdb'] called with an empty host list.
20:42:52 Finished sync-proxies (duration: 00m 00s)
20:42:52 Started sync-apaches
sync-common: 100% (ok: 5; fail: 0; left: 0)                                     
20:42:54 Finished sync-apaches (duration: 00m 02s)
20:42:54 Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 02s)
Done

Try now?

Nope I still get the same result.

Looks like these extra languages are all values of $languageAliases in WikimediaMaintenance's dumpInterwiki.php

(eo, ja, and zh are unaffected because we have beta wikis for all of those. No idea about cs and da)

Is #WMF-deploy-2015-10-07_(1.27.0-wmf.2) actually correct, because this change should be only deployed on the beta cluster wikis?

I'd just ignore those tags here. As you say, it's not really relevant in this case.

Change 243478 had a related patch set uploaded (by Alex Monk):
Don't apply language aliases outside of production

https://gerrit.wikimedia.org/r/243478

Change 243478 merged by jenkins-bot:
Don't apply language aliases outside of production

https://gerrit.wikimedia.org/r/243478

Can you run your script again @XZise? I don't have a "./jq" (I'm helping with Beta-Cluster-Infrastructure, I don't have pywikibot)
There should be less errors now.

[[https://stedolan.github.io/jq/|jq]] is a script to parse JSON in bash. Has nothing to do with pywikibot 😉

Anyway I've rerun the script and got:

  • http://aa.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ar.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ca.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://de.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://en.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://eo.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://es.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://fa.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://he.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://hi.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ja.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ko.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://ru.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://simple.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://sq.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://uk.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently
  • http://zh.wikipedia.beta.wmflabs.org/: HTTP/1.1 301 Moved Permanently

So seems good, but I'd like to verify that with pywikibot later.

jayvdb removed XZise as the assignee of this task.Aug 6 2016, 5:06 AM
jayvdb added a project: TestMe.