Please add a new option to filter [[Special:Newpages]] by page size. Thus, pages with very small size (which should be deleted per empty or no content) and pages with very large size (possibly copyright violations) could be easily found. (Hundreds of pages are created every day in largest wikis and it is very tiring to check them all manually.)
Description
Details
- Reference
- bz10817
Subject | Repo | Branch | Lines /- | |
---|---|---|---|---|
New HTMLForm size filter field; add size filter to Special:Newpages | mediawiki/core | master | 93 -0 |
Related Objects
- Mentioned In
- E200: Code Review Office Hours
Event Timeline
chughakshay16 wrote:
Posted a review in Gerrit , here is the link - https://gerrit.wikimedia.org/r/#/c/8036/
Patch needs only minor code style changes and a commit message that explains what the patch does and does not refer to a previous versiosn of that same patch.
Akshay: Patch has been inactive for a while so I'll reset the assignee of this bug report, but please assign this bug report again to yourself if you plan to work on your patch again and fix the remaining small issues (which would be very welcome to get this bug fixed!).
Change 8036 had a related patch set uploaded by Krinkle:
Filter Special:Newpages by page size
Change 8036 had a related patch set uploaded (by TTO):
New HTMLForm size filter field; add size filter to Special:Newpages
I've tested this query on all shards, several wikis, with special attention to all large wikis (enwiki, dewiki, commons, wikidata, frwiki, zhwiki) both in the regular traffic slaves and special slaves. I found this query to be as good as it gets, detailed results follow (I only show the results on enwiki because the other ones match this):
MariaDB MARIADB db1073.eqiad.wmnet enwiki > EXPLAIN SELECT /* IndexPager::buildQueryInfo (NewPagesPager) Bawolff */ rc_namespace,rc_title,rc_cur_id,rc_user,rc_user_text,rc_comment,rc_timestamp,rc_patrolled,rc_id,rc_deleted,page_len AS `length`,page_latest AS `rev_id`,rc_this_oldid,page_namespace,page_title,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags` FROM `recentchanges` INNER JOIN `page` ON ((page_id=rc_cur_id)) WHERE rc_new = '1' AND (page_len <= 1000) AND rc_namespace = '0' AND page_is_redirect = '0' ORDER BY rc_timestamp DESC LIMIT 51\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: recentchanges type: ref possible_keys: rc_namespace_title,rc_cur_id,new_name_timestamp,rc_ns_usertext,rc_name_type_patrolled_timestamp key: new_name_timestamp key_len: 5 ref: const,const rows: 116688 Extra: Using where *************************** 2. row *************************** id: 1 select_type: PRIMARY table: page type: eq_ref possible_keys: PRIMARY,page_len,page_redirect_namespace_len key: PRIMARY key_len: 4 ref: enwiki.recentchanges.rc_cur_id rows: 1 Extra: Using where *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: change_tag type: ref possible_keys: change_tag_rc_tag key: change_tag_rc_tag key_len: 5 ref: enwiki.recentchanges.rc_id rows: 1 Extra: Using index 3 rows in set (0.01 sec) 51 rows in set (0.07 sec) ---------------------------- ------- | Variable_name | Value | ---------------------------- ------- | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 353 | | Handler_read_last | 0 | | Handler_read_next | 9 | | Handler_read_prev | 300 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 0 | | Handler_update | 0 | | Handler_write | 0 | ---------------------------- ------- 25 rows in set (0.00 sec) MariaDB MARIADB db1055.eqiad.wmnet enwiki > EXPLAIN SELECT /* IndexPager::buildQueryInfo (NewPagesPager) Bawolff */ rc_namespace,rc_title,rc_cur_id,rc_user,rc_user_text,rc_comment,rc_timestamp,rc_patrolled,rc_id,rc_deleted,page_len AS `length`,page_latest AS `rev_id`,rc_this_oldid,page_namespace,page_title,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags` FROM `recentchanges` INNER JOIN `page` ON ((page_id=rc_cur_id)) WHERE rc_new = '1' AND (page_len <= 1000) AND rc_namespace = '0' AND page_is_redirect = '0' ORDER BY rc_timestamp DESC LIMIT 51\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: recentchanges type: ref possible_keys: rc_namespace_title,rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp key: new_name_timestamp key_len: 5 ref: const,const rows: 122688 Extra: Using where *************************** 2. row *************************** id: 1 select_type: PRIMARY table: page type: eq_ref possible_keys: PRIMARY,page_len,page_redirect_namespace_len key: PRIMARY key_len: 4 ref: enwiki.recentchanges.rc_cur_id rows: 1 Extra: Using where *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: change_tag type: ref possible_keys: ct_rc_id,change_tag_rc_tag key: ct_rc_id key_len: 5 ref: enwiki.recentchanges.rc_id rows: 1 Extra: Using index 3 rows in set (0.02 sec) 51 rows in set (0.00 sec) ---------------------------- ------- | Variable_name | Value | ---------------------------- ------- | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 0 | | Handler_icp_attempts | 0 | | Handler_icp_match | 0 | | Handler_mrr_init | 0 | | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 353 | | Handler_read_last | 0 | | Handler_read_next | 9 | | Handler_read_prev | 300 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_tmp_update | 0 | | Handler_tmp_write | 0 | | Handler_update | 0 | | Handler_write | 0 | ---------------------------- ------- 25 rows in set (0.01 sec)
Change 8036 merged by jenkins-bot:
New HTMLForm size filter field; add size filter to Special:Newpages