To implement T253796, table(s) should be created that allow storing log-specific information in a seperate table. When CheckUser queries are made, the tables will be joined so that results can continue to be shown.
This involves creating two new tables:
- cu_log_event
- cu_private_event
The cu_log_event table will contain entries that at the moment go into cu_changes that also:
- Are log actions
- Have a associated log ID in the logging table
The cu_private_event will contain entries that at the moment go into cu_changes that also:
- Are log actions
- Do not have an associated log ID (and thus are very unlikely to be shown publicly or be linked to)
The two tables are needed because when we have a log ID to reference storing the log parameters and other data that is not needed for indexing purposes can be gained using a join on the log ID to the logging table. Entries which do not have a associated log ID would then have to store this data, but to ensure that a new table is not polymorphic two tables are needed so that some entries have this column not be used.
Examples:
- Log for a page move would go into the cu_log_event
- Edit to a page would go into cu_changes
- Login event would go into cu_private_event
- Log events previously stored in cu_changes would be moved to cu_private_event
Doing this allows the solving of several security tickets and solving / making substantial progress towards solving:
This method was suggested by Ladsgroup to reduce the polymorphic nature of cu_changes over just adding cuc_log_id as suggested in the parent task. This would solve the parent task by removing log entries from cu_changes.
Todo (mostly in order but some can be done before others as desired):
- Create the new tables
- Create the schemas for these tables
- Deploy the new tables to WMF production
- Update code that writes and deletes to cu_changes to also use the new tables based on a schema comptability config (write new support)
- The many methods that insert, delete and change cu_changes in Hooks.php - gerrit 876360
- PopulateCheckUserTable maintenance script - gerrit 877182
- PurgeOldData maintenance script - gerrit 877182
- Add combined read and write new support - T341586
- Add read new support so that the code that queries CheckUser reads from all three tables:
- CheckUserQueryBuilder - gerrit 881420
- LogFormatter-like class for cu_private_event rows (T341688)
- CheckUser
- Get edits pager
- Get IPs pager
- Get users pager
- CheckUser API (T341827)
- Investigate (T329189)
- Temporary account IP reveal
- Create a maintenance script to move log entries from cu_changes to cu_private_event, while keeping the moved entries in cu_changes with cuc_only_for_read_old set to 1 - gerrit 879919 (merged) gerrit 886482
- Enable write new (T330158)
- Change the value in extension.json
- Do this on WMF wikis
- Double check the new tables are not included any DB dumps - Ladsgroup probably the one to contact for this
- Move log entries in cu_changes to cu_private_event (first maintenance script)
- Add this to update.php while ensuring that it is run before any column removal but after the creation of cu_private_event and cu_log_event
- Have this maintenance script run on WMF wikis - No longer needed as data on WMF wikis has been purged that needed the move
- Enable read new
- Change the value in extension.json
- Do this on WMF wikis
- Stop writing old
- Delete cu_changes rows with cuc_only_for_read_old set to 1 - T341830
- Create a maintenance script for this
- Add this maintenance script to update.php while ensuring it it run before column removal but after the moving of log entries
- Run this maintenance script on WMF wikis (T366781)
- Remove wgEventTablesMigrationStage config, inferring the value as SCHEMA_COMPAT_NEW in places that it was used (T366546)
- Remove columns related only log types in cu_changes along with cuc_only_for_read_old (T366782)
- Remove these columns from WMF production (T370903). This should also implicitly run optimise cu_changes for WMF wikis (needed because on enwiki it should be a ~20% drop and on loginwiki it should be a 99.9% drop in row count).