Page MenuHomePhabricator

Special:AbuseLog?wpSearchUser=${ip} takes 30-90 seconds to load, sometimes is 504
Closed, ResolvedPublic

Description

Quick example: https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchUser=90.221.128.138

<!-- Served by mw1059 in 36.766 secs. -->

This was originally reported in #wikimedia-tech by Cyberpower and then confirmed by myself and jeremyb.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=47422

Details

Reference
bz47045

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:24 AM
bzimport set Reference to bz47045.
bzimport added a subscriber: Unknown Object (MLST).

09 14:26:43 < Cyberpower678> So what's causing the problem?
09 14:27:06 * jeremyb has to assume full table scan
09 14:27:22 < jeremyb> the timing is too consistent

See also Ie331ec7ef3b461ae10

mysql:wikiadmin@db1017 [enwiki]> explain SELECT * FROM abuse_filter_log LEFT JOIN abuse_filter on (af_id=afl_filter) WHERE afl_user = 0 AND afl_user_text='90.221.128.138';
---- ------------- ------------------ -------- --------------- ---------- --------- ------------------------------------ --------- -------------

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

---- ------------- ------------------ -------- --------------- ---------- --------- ------------------------------------ --------- -------------

1SIMPLEabuse_filter_logrefafl_userafl_user8const3893344Using where
1SIMPLEabuse_filtereq_refPRIMARYPRIMARY8enwiki.abuse_filter_log.afl_filter1

---- ------------- ------------------ -------- --------------- ---------- --------- ------------------------------------ --------- -------------
2 rows in set (0.06 sec)

mysql:wikiadmin@db1017 [enwiki]> explain SELECT * FROM abuse_filter_log LEFT JOIN abuse_filter on (af_id=afl_filter) WHERE afl_user = 0 AND afl_user_text='90.221.128.138'\G

  • 1. row ******* id: 1 select_type: SIMPLE table: abuse_filter_log type: ref

possible_keys: afl_user

    key: afl_user
key_len: 8
    ref: const
   rows: 3959592
  Extra: Using where
  • 2. row ******* id: 1 select_type: SIMPLE table: abuse_filter type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 8
    ref: enwiki.abuse_filter_log.afl_filter
   rows: 1
  Extra:

2 rows in set (0.02 sec)

Running the query on enwiki (not going to paste results):
4 rows in set (47.63 sec)

Missing indexes are missing:

http://p.defau.lt/?83qAX_9Lk44zHB1B7CKHbQ

vs

PRIMARY KEY (afl_id),
KEY filter_timestamp (afl_filter,afl_timestamp),
KEY user_timestamp (afl_user,afl_user_text,afl_timestamp),
KEY (afl_timestamp),
KEY page_timestamp (afl_namespace, afl_title, afl_timestamp),
KEY ip_timestamp (afl_ip, afl_timestamp),
KEY (afl_rev_id),
KEY (afl_log_id),
KEY wiki_timestamp (afl_wiki, afl_timestamp)

Talked with hoo about this. It looks like we probably want to modify the query so that $conds['afl_ip'] = IP::prettifyIP( $this->mSearchUser ); when searching for an IP.

That should make good use of the (enwiki's) afl_ip index, which we probably want to add back into the extension's sql definition.

(In reply to comment #5)

Talked with hoo about this. It looks like we probably want to modify the
query
so that $conds['afl_ip'] = IP::prettifyIP( $this->mSearchUser ); when
searching for an IP.

That should make good use of the (enwiki's) afl_ip index, which we probably
want to add back into the extension's sql definition.

Why would a system store IPs using IP::prettifyIP() and not IP::sanitizeIP() (or a hex function preferrably)?

(In reply to comment #6)

Why would a system store IPs using IP::prettifyIP() and not IP::sanitizeIP()
(or a hex function preferrably)?

Because it's bad? Someone wrote that back then IPv4 was up to date and it simply didn't matter what you use... now we have a lot of inconsistent data and a lot of trouble.

IP::prettifyIP() is really meant for display only, not storage. It would be disappointing to have use that.

AbuseFilter stores the result of $wgRequest->getIP(), which when I look in the db of enwiki, seems to always look like the rfc5952 format, which seems to be what prettifyIP also gives.

For example, try: select afl_user, afl_user_text, afl_ip, afl_timestamp from abuse_filter_log where afl_timestamp > '20130401000000' and afl_user_text != afl_ip and afl_user = 0 AND afl_user_text like '%:%' limit 100;

Maybe we need an rfc5952 formatter in the IP class, to handle any pieces that prettifyIP doesn't?

The better, long term fix is to store all addresses as numbers, and then translate the format when we display them, but I think it would be good to get this bug fixed a little sooner than it would take to do the schema and code changes for that.

This page is also updating user touched time for the user fetching the page. Got a lock timeout on the s1 master for that user touched update.

@ Special:AbuseLog?wpSearchUser=${ip}

Load times have gotten longer to the point where I get a 504 a lot of the times now.

Thanks for filing this bug, Legoktm.

I am also getting 504 whenever I query AbuseLog for IPs, both v4 and v6. I do not get an error for user accounts or articles.

Related URL: https://gerrit.wikimedia.org/r/60021 (Gerrit Change I003a609251cf922939d8c2e49aff025d726ccf78)

I just added https://gerrit.wikimedia.org/r/#/c/60021/

That *can* be deployed, which will revert the query change from Ie331ec7ef3b461ae10 and allow IPv4 queries to be returned quickly.

Comparison of query performance on enwiki below.

However, this will break querying for IPv6 ip addresses. For that, we need a slightly more involved fix. I'll start working on that right now. But if getting back functionality is worth breaking IPv6 lookups (which have been broken until Ie331ec7ef3b461ae10), then the patch can be merged and deployed, if the Stewards give an ok.

mysql:wikiadmin@db1056 [enwiki]> explain SELECT * FROM abuse_filter_log LEFT

-> JOIN abuse_filter on (af_id=afl_filter) WHERE afl_user = 0 AND
-> afl_user_text='90.221.128.138';

------ ------------- ------------------ -------- --------------- ---------- --------- ------------------------------------ --------- -------------

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

------ ------------- ------------------ -------- --------------- ---------- --------- ------------------------------------ --------- -------------

1SIMPLEabuse_filter_logrefafl_userafl_user8const4343552Using where
1SIMPLEabuse_filtereq_refPRIMARYPRIMARY8enwiki.abuse_filter_log.afl_filter1Using where

------ ------------- ------------------ -------- --------------- ---------- --------- ------------------------------------ --------- -------------
2 rows in set (0.08 sec)

mysql:wikiadmin@db1056 [enwiki]> explain SELECT * FROM abuse_filter_log LEFT JOIN abuse_filter on (af_id=afl_filter) WHERE afl_ip=afl_user_text AND afl_user = 0 AND afl_user_text='90.221.128.138';
------ ------------- ------------------ -------- ----------------- --------- --------- ------------------------------------ ------ ------------------------------------

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

------ ------------- ------------------ -------- ----------------- --------- --------- ------------------------------------ ------ ------------------------------------

1SIMPLEabuse_filter_logrefafl_user,afl_ipafl_ip257const4Using index condition; Using where
1SIMPLEabuse_filtereq_refPRIMARYPRIMARY8enwiki.abuse_filter_log.afl_filter1Using where

------ ------------- ------------------ -------- ----------------- --------- --------- ------------------------------------ ------ ------------------------------------
2 rows in set (0.04 sec)

afeldman wrote:

I've added the user_timestamp index to enwiki, the original query is now fast:

mysql:wikiadmin@db1056 [enwiki]> explain SELECT * FROM abuse_filter_log LEFT JOIN abuse_filter on (af_id=afl_filter) WHERE afl_user = 0 AND afl_user_text='90.221.128.138'\G

  • 1. row ******* id: 1 select_type: SIMPLE table: abuse_filter_log type: ref

possible_keys: user_timestamp

    key: user_timestamp
key_len: 265
    ref: const,const
   rows: 4
  Extra: Using index condition
  • 2. row ******* id: 1 select_type: SIMPLE table: abuse_filter type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 8
    ref: enwiki.abuse_filter_log.afl_filter
   rows: 1
  Extra: Using where

2 rows in set (0.03 sec)

Probably still missing on most other projects.

(In reply to comment #0)

Quick example:
https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchUser=90.
221.128.138

<!-- Served by mw1059 in 36.766 secs. -->

Same link now is <!-- Served by mw1062 in 0.254 secs. --> so it looks fixed for enwiki.

Should a new bug be filed to get the index created on all the other projects using the AbuseFilter?

Wikis created after 2010 have the index already, so it would probably be easiest if you add specific wikis where this problem is showing up. Either here or a new bug.

(In reply to comment #17)

Should a new bug be filed to get the index created on all the other projects
using the AbuseFilter?

I think a new bug should be filed about getting this index on all Wikimedia wikis (all Wikimedia wikis have the AbuseFilter extension enabled). I don't think we want to burden ourselves with an inconsistent schema for this extension across wikis. I think it'll just end up making debugging future issues more difficult and annoying.