Page MenuHomePhabricator

DPL3 (third-party, dynamic page list) allows generating SQL which generates large temporary tables on disk
Closed, DeclinedPublic

Description

As an example, Désencyclopédie had this lovely query on the front page, which actually seemed to be able to be run on DPL 2.2 back in the dark, dreadful days before it finally moved off desencyclopedie.wikia.com onto its own domain with "modern" versions of the software:

	<DPL>
	titlematch=%/Récent
	namespace=Portail
	replaceintitle=$/.*$,
	format=¶{| style="background:#f5faff", ¶|<div class="onlyimage">{{Portail:%TITLE%/lien}}</div>|||| '''[[Portail:%TITLE%|%TITLE%]]'''<br><div class="portal-article-box" >,</div>¶|-,¶|-¶|-,¶|}
	include=*
	includemaxlength=250
	order=descending
	ordermethod=lastedit
	count=7
	allowcachedresults=true
	</DPL>

DPL2 tolerated this, but DPL3? Once the project was moved to its own domain (desencyclopedie.org) and DPL3, this sort of ugliness started turning up in SHOW PROCESSLIST;

	 SELECT DISTINCT rev.rev_timestamp,`page`.page_namespace AS `page_namespace`,`page`.page_id
	 AS `page_id`,`page`.page_title AS `page_title` 
	 FROM `revision` `rev`,`page`    WHERE ((`page`.page_title like'%/Récent')) AND `page`.page_is_redirect = '0'
	 AND `page`.page_namespace = '114'  ORDER BY rev.rev_timestamp DESC LIMIT 7;

This may be related to a known bug: https://gitlab.com/hydrawiki/extensions/DynamicPageList/issues/88

In other words, for every revision on the entire database, look up the page name, see if it's in one particular namespace (114:Portail - Portal) and then see if the page name ends in /Recent. That's nuts and slows the database to a crawl. IT SHOULD NOT BE POSSIBLE to do this. T124841 was bad enough from a performance standpoint, but DPL2 looks worse and DPL3 worse still.

The query should be running the other way - go to page first (not revision), narrow everything down to namespace 114 (which should be an indexed field), then find only the newest entries by date in that namespace. DPL3 makes it too easy to generate expensive queries which turn servers into tarpits.

It needs to be possible to reject DPL invocations which reference unindexed fields, yes, but the SQL code being generated by these extensions also needs to optimise the queries which it does make to ensure that they never retrieve every row in an entire table, only to discard all but a handful later.