Page MenuHomePhabricator

"Blocked" response when trying to access constraintsrdf action from production host
Closed, ResolvedPublic

Description

I am trying to access the URL https://www.wikidata.org/wiki/Q18289639?action=constraintsrdf&nocache=1531160117052 on production host (wdq1009.eqiad.wmnet) and I am getting this:

Blocked
Jump to navigation Jump to search
Your username or IP address has been automatically blocked by MediaWiki. The reason given is:

Anonymous contributions are not allowed from your IP address (2620:IP_HIDDEN). Please log in.
Start of block: 19:13, 9 July 2018
Expiration of block: infinite
Intended blockee: 2620:IP_HIDDEN
Your current IP address is 2620:IP_HIDDEN. Please include all above details in any queries you make.

Return to Wikidata:Main Page.

Retrieved from "https://www.wikidata.org/wiki/Q18289639"

I've hidden the IP just in case, ping me for it if needed. This is strange since other accesses to Q18289639 work, and action=constraintsrdf is not even an editing action, let alone why WMF production host is being blocked?

The IP seems to be that of webproxy.eqiad.wmnet (which wdq9 is using as proxy).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Additionally, the blocking page returns 200 status code, which is also wrong as it pretends it to be correct result and not error page (which could be properly handled by the client then).

This raises some questions that are probably unrelated to the problem at hand, but might affect things indirectly:

  • Why is an internal service (wdqs) querying a public endpoint? It should probably use private internal endpoints like appservers.svc or api.svc, but there may be arguments about desirability of [Varnish] caching. This is something we're grappling with in general in the longer-term (trying to understand and/or eliminate private internal service<->service traffic routing through the public edge unnecessarily).
  • Why is it using webproxy to access it? It should be able to reach www.wikidata.org without any kind of proxy.

Why is an internal service (wdqs) querying a public endpoint?

It needs to load Wikidata data, and I don't know any other way to do it.

It should probably use private internal endpoints like appservers.svc or api.svc, but there may be arguments about desirability of [Varnish] caching.

It doesn't need varnish. If you point me to the docs of how to do it better, I can switch the URLs.

Why is it using webproxy to access it?

It seems to work the same with or without proxy (unless there's a config by puppet that sets the proxy by default?) but my understanding was production hosts need proxy to talk to the web. I've enabled proxy because in the past I had issues reaching www.wikidata.org from production without proxy.

Yes, I verified, I get the same block with curl --noproxy \*. Just different IP in the error message.

This raises some questions that are probably unrelated to the problem at hand, but might affect things indirectly:

  • Why is an internal service (wdqs) querying a public endpoint? It should probably use private internal endpoints like appservers.svc or api.svc, but there may be arguments about desirability of [Varnish] caching. This is something we're grappling with in general in the longer-term (trying to understand and/or eliminate private internal service<->service traffic routing through the public edge unnecessarily).

That specific request seems to be trying to bust cache nocache=1531160117052 so bypassing varnish caching is probably not an issue. In the more general context, there are probably instances where caching makes a lot of sense.

  • Why is it using webproxy to access it? It should be able to reach www.wikidata.org without any kind of proxy.

That's probably an historical mistake, no reason that I can think of.

Note that the same request directly from wdqs1009.eqiad.wmnet (curl -v 'https://www.wikidata.org/wiki/Q18289639?action=constraintsrdf&nocache=1531160117052') also gives an HTTP 200 Blocked. Same from elastic2035 (just for fun). But same request from my own desktop gives an HTTP 204 NO CONTENT.

For the record, 204 NO CONTENT or 200 with RDF output is the right answer. For most items, it's 204 no content.

It looks to me that the block is done by mediawiki itself (see P7355 for details):

< x-cache: cp1066 pass, cp1054 pass
< x-cache-status: pass

That looks like varnish just lets it through. (Note that I have no idea how those blocks are working, just trying to guess).

Yeah looks like ipblocks table for wikidata has block on 2620:0:862:101:0:0:0:0/96 by user "Merlissimo" with comment 'Toolserver Range - no anon edits' but this doesn't seem to match wdq9. So probably not this one.

Yeah looks like ipblocks table for wikidata has block on 2620:0:862:101:0:0:0:0/96 by user "Merlissimo" with comment 'Toolserver Range - no anon edits'.

That block goes from 2620:0000:0862:0101:0000:0000:0000:0000 to
2620:0000:0862:0101:0000:0000:ffff:ffff. It does not match either webproxy or wdqs1009

This block seems to be driven by $wgSoftBlockRanges setting in CommonSettings.php, which includes $wgSquidServersNoPurge, which includes private WMF IPs.

	// Addresses used by WMF, people should log in to edit from them directly.
	$wgSquidServersNoPurge

Now we need to figure out why it thinks we're editing anything.

Hi @Jonas: I blocked that particular range, among others allocated to Telefonica Germany, in an attempt to enforce this global ban. @Multichill and @MisterSynergy can tell you more about what has been going on. If you'd like me to lift that particular range block, I'd be happy to do this, but please be aware of what you may be allowing to happen.

Looks like we don't need to change blocks - instead, 'constraintsrdf' should be marked as read action not requiring blocks check.

Change 444736 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseQualityConstraints@master] Set requiresUnblock to false for CheckConstraintsRdf

https://gerrit.wikimedia.org/r/444736

Change 444736 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Set requiresUnblock to false for CheckConstraintsRdf

https://gerrit.wikimedia.org/r/444736

Yeah looks like ipblocks table for wikidata has block on 2620:0:862:101:0:0:0:0/96 by user "Merlissimo" with comment 'Toolserver Range - no anon edits' but this doesn't seem to match wdq9. So probably not this one.

Toolserver? Unblocked.

Smalyshev claimed this task.

Seems to be OK now.