Page MenuHomePhabricator

Get rid of geoiplookup service
Closed, ResolvedPublic

Description

There's probably no fundamental reason we need to offer this once the IPv6 issues are sorted out. https://geoiplookup.wikimedia.org/
But is there a specific/substantial benefit in removing it?


Spammers are using it and such, too:
http://static.tusalivysyg.com/bg/?d=60EB69BC891CB807___z=2___rd=5187d447bf8d4de6bc68c2332e994fb1___cd=WP___instgrp=not.found___channel=not_found___partner=not_found___InstallId=not_found___uninstalled=not_found


See also: https://wikitech.wikimedia.org/w/index.php?title=Geolocation

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle renamed this task from Get rid of /geoiplookup URL to Get rid of geoiplookup service.Dec 18 2015, 10:43 PM
Krinkle updated the task description. (Show Details)

While looking into T141786 I stumbled on this again... lots of probably-illegitimate traffic to geoiplookup.wm.o with no referer header and no user-agent, spamming from all over. So it's bugging me again. While looking at this, I code-searched for utilization of our geoiplookup service (on github, across all WMF code) and only came up with 3 references to it (other than the stuff that defines the service itself in our dns puppet repos):

  1. Toolserver has some kind of redirect into it: https://github.com/wikimedia/operations-puppet/blob/5bdedaa7422acbb0bd31725e1a0e11577502614b/modules/toolserver_legacy/templates/www.toolserver.org.erb#L238
  2. ULS uses it, maybe, but via meta.wikimedia.org/geoiplookup: https://github.com/wikimedia/mediawiki-extensions-UniversalLanguageSelector/blob/4dc4b7b81ba16a6fa7e39b2341c3f890e5b6464a/UniversalLanguageSelector.hooks.php#L130
  3. CN uses it for IPv6 users, which is the use I was already most-familiar with: https://github.com/wikimedia/mediawiki-extensions-CentralNotice/blob/c746902ce189d46a049bc69cb39c95df8b160d24/resources/subscribing/ext.centralNotice.geoIP.js#L9

I wonder if the CN usage of it always sets referer correctly? I do see lots of legit referer headers. If we're confident it does, we could block clients that lack a proper Referer header from one of our prod domains...

Change 305418 had a related patch set uploaded (by BBlack):
www.toolserver.org: remove geoiplookup reference

https://gerrit.wikimedia.org/r/305418

Change 305420 had a related patch set uploaded (by BBlack):
Remove geoiplookup service IPs from LVS

https://gerrit.wikimedia.org/r/305420

Change 305421 had a related patch set uploaded (by BBlack):
GeoIP VCL: remove JSON output support

https://gerrit.wikimedia.org/r/305421

Change 305422 had a related patch set uploaded (by BBlack):
Remove geoiplookup DNS entries

https://gerrit.wikimedia.org/r/305422

Nemo_bis subscribed.

I do see lots of legit referer headers.

ULS uses it, for instance, and hundreds of standalone wikis use ULS. Of course providing a service to non-Wikimedia MediaWiki sites was not the original purpose of this service, but if it gets shut down it would be good to communicate a reason, such as one of the following:

  • someone complained about us providing this service;
  • by shutting it down we will save X $/year;
  • due to abuse, we had to babysit it and we don't consider it a wise use of WMF resources/it would need more work to keep functioning;
  • ...

Change 305418 merged by BBlack:
www.toolserver.org: remove geoiplookup reference

https://gerrit.wikimedia.org/r/305418

I do see lots of legit referer headers.

ULS uses it, for instance, and hundreds of standalone wikis use ULS. Of course providing a service to non-Wikimedia MediaWiki sites was not the original purpose of this service, but if it gets shut down it would be good to communicate a reason, such as one of the following:

  • someone complained about us providing this service;
  • by shutting it down we will save X $/year;
  • due to abuse, we had to babysit it and we don't consider it a wise use of WMF resources/it would need more work to keep functioning;
  • ...

The reason we need to shut it down is primarily one we'd rather not communicate broadly. 3rd party wikis' ULS usage of this service will have to die with it.

I think a new ULS version not relying on that should be released before the shutdown, then.

I think a new ULS version not relying on that should be released before the shutdown, then.

As far as I know there is no alternative, so "not relying" would essentially mean disabling all features depending on that (which is not really different from just letting it fail).

I can think of many ways to reduce the number of requests and make it harder for non-ULS users to access this information, if that helps.

If all else fails, could the WMF consider serving something like https://freegeoip.net/ (which uses the free downloads https://dev.maxmind.com/geoip/geoip2/geolite2/ ) at https://meta.wikimedia.org/geoiplookup or whatever URL is used by ULS?

We cannot and will not be a general purpose GeoIP provider, for various and diverse reasons.

You can use freegeoip, or another free or paid-for service — or set up your own service even. MaxMind, the most popular geolocation provider that is also backing our own GeoIP lookups, offers JavaScript code along with their API product. Prices start from $0.0001 per query for country queries (which is probably all that ULS needs?) and four times as that for city queries. That's $10 per 100,000 country queries, a pretty reasonable price to pay IMHO.

If you want to help, integrating MaxMind and/or freegeoip as GeoIP providers into MediaWiki (ULS?) would be a great way to start this. Alternatively, you could even write PHP code that queries GeoLite(2) or GeoCity(2) and spits out a JSON, and integrate that seamlessly into ULS and/or Core (if you do, I'd strongly recommend that you implement Referer checks).

Looking at the bigger picture, since GeoIP functionality is generally interesting for various extensions (ULS, CN etc.) as well as gadgets, it would make sense to split it into a separate component. This has been discussed before and is tracked at T102848. This component could be then be modular and integrate with various GeoIP methods/providers (cookie, local JSON endpoint, freegeoip, MaxMind etc.).

You can use freegeoip, or another free or paid-for service — or set up your own service even.

Who is you? WMF? The Language team? Non-WMF MediaWiki installs?

Non-WMF MediaWiki installs?

That

Sorry, my bad. I meant that non-WMF MediaWiki installs could use any of these. Someone (the ULS maintainers/contributors) should add code to ULS to support any or all of these options for third-party users.

Location based language suggestions is a feature in ULS that is maintained by the Language team. It would have been nice if our team was contacted earlier in the process to discuss alternatives to avoid disruptions. I do not know what is the timeframe for shutdown, but it will take months for third party users to upgrade even if we released a "fix" today.

I personally think it is unreasonably to expect third party wiki users to set up a separate service just for this nor do they usually have any budgets so paid services would be out of question as well. Would there not be free services this would essentially remove one feature.

We'll be talking to the relevant teams next week about making new releases ahead of completely disabling any service, that's a given. But really, the mistake isn't in the present, it's in the past. We never should've released the service software combination we have out there today in the first place :/

Location based language suggestions is a feature in ULS that is maintained by the Language team. It would have been nice if our team was contacted earlier in the process to discuss alternatives to avoid disruptions. I do not know what is the timeframe for shutdown, but it will take months for third party users to upgrade even if we released a "fix" today.

You're right this isn't very planned shutdown. I want to reassure you thought that a) it's not that we had planned this sunsetting internally and just failed to communicate it to you in time and b) we're doing said sunsetting very soon and for various reasons we cannot wait for months. This is unfortunate and I'm sorry for that.

The flip side is of course that we never made any guarantees about that service or ever intended for it to be used by third-parties. We knew it would be eventually going away for a couple of years now and had we been asked, we would have explained that it shouldn't have been shipped in code that was intended for third-parties.

( what Brandon said)

I personally think it is unreasonably to expect third party wiki users to set up a separate service just for this nor do they usually have any budgets so paid services would be out of question as well. Would there not be free services this would essentially remove one feature.

I honestly don't understand why — aren't these users paying for hosting MediaWiki somewhere anyway? If the wiki in question is very small, then $5-10 could last them years; if the wiki is large and gets a lot of req/s, then it's not unreasonable to either expect them to pay up or roll out a separate service. It's certainly unreasonable to expect *us* to incur costs that have to do with running MediaWiki installs in the wild.

Regardless, if you feel strongly about this, MaxMind's PHP library serves as both a client to their API *and* the capability of reading the databases directly off the disk (either the free GeoLite ones, or the paid-for GeoCity ones). For the latter, they offer both a PHP C extension linking to libmaxminddb and a pure PHP implementation. The library and pure PHP version is tested against PHP 5.3 and HHVM and they offer it via Composer as well. The API is trivial, so you (or a third-party interested in that?) could probably write code that exposes a similar to geoiplookup endpoint directly off MediaWiki.

Talk to the language team.
Please?

Talk to the language team.

That would be what the conversation with @Nikerabbit is right now.

Everyone: please understand that the plan was for Operations to communicate this to the affected parties at the right time (as Faidon and Brandon have said already). The conversation on this task simply happened before they were able to do that. Nothing is being done without communication.

We cannot and will not be a general purpose GeoIP provider, for various and diverse reasons.

Can someone list the various and diverse reasons? It might make sense to document them on a wiki page. Maybe we can wait until after the geoiplookup.wikimedia.org service is disabled to mitigate the concerns about communicating broadly?

I never liked geoiplookup.wm.org for precisely this reason: it can be easily abused.

What kind of abuse?

In some ways, Wikipedia is basically a giant pastebin, Commons is an arbitrary media host that supports hotlinking, there's a maps tiling service now, etc. What's the concern here? Performance? Privacy? Licensing?

Change 306065 had a related patch set uploaded (by BBlack):
text VCL: 403 geoiplookup w/o referer

https://gerrit.wikimedia.org/r/306065

Change 306065 merged by BBlack:
text VCL: 403 geoiplookup w/o referer

https://gerrit.wikimedia.org/r/306065

Change 306309 had a related patch set uploaded (by BBlack):
text VCL: remove geoiplookup hostname support

https://gerrit.wikimedia.org/r/306309

Bump - the geoiplookup JSON service (the hostname and the /geoiplookup path) will go away sometime this week, preferably tomorrow or Thursday. I don't believe this will break anything here for WMF services: CN should fail the call silently in the background in the small minority of cases where it makes the call at all with our current v6 cookies, and ULS is already off of it. I'd still rather see the CN change merge first ( T143271 -> https://gerrit.wikimedia.org/r/#/c/306598/ ) if possible.

Change 309347 had a related patch set uploaded (by BBlack):
Remove /geoiplookup, but not geoiplookup.wm.o

https://gerrit.wikimedia.org/r/309347

Change 306309 abandoned by BBlack:
text VCL: remove geoiplookup hostname support

Reason:
No longer need this, order was swapped

https://gerrit.wikimedia.org/r/306309

Change 309347 merged by BBlack:
Remove /geoiplookup, but not geoiplookup.wm.o

https://gerrit.wikimedia.org/r/309347

Change 305422 merged by BBlack:
Remove geoiplookup DNS entries

https://gerrit.wikimedia.org/r/305422

Change 305420 merged by BBlack:
Remove geoiplookup service IPs from LVS

https://gerrit.wikimedia.org/r/305420

Change 305421 merged by BBlack:
text VCL: remove JSON output support

https://gerrit.wikimedia.org/r/305421

Change 309501 had a related patch set uploaded (by BBlack):
Trivial bugfix followup to 4a586c3f6

https://gerrit.wikimedia.org/r/309501