Page MenuHomePhabricator

Create separate tables for log events in CheckUser
Closed, ResolvedPublicFeature

Description

To implement T253796, table(s) should be created that allow storing log-specific information in a seperate table. When CheckUser queries are made, the tables will be joined so that results can continue to be shown.

This involves creating two new tables:

  • cu_log_event
  • cu_private_event

The cu_log_event table will contain entries that at the moment go into cu_changes that also:

  • Are log actions
  • Have a associated log ID in the logging table

The cu_private_event will contain entries that at the moment go into cu_changes that also:

  • Are log actions
  • Do not have an associated log ID (and thus are very unlikely to be shown publicly or be linked to)

The two tables are needed because when we have a log ID to reference storing the log parameters and other data that is not needed for indexing purposes can be gained using a join on the log ID to the logging table. Entries which do not have a associated log ID would then have to store this data, but to ensure that a new table is not polymorphic two tables are needed so that some entries have this column not be used.

Examples:

  • Log for a page move would go into the cu_log_event
  • Edit to a page would go into cu_changes
  • Login event would go into cu_private_event
  • Log events previously stored in cu_changes would be moved to cu_private_event

Doing this allows the solving of several security tickets and solving / making substantial progress towards solving:

This method was suggested by Ladsgroup to reduce the polymorphic nature of cu_changes over just adding cuc_log_id as suggested in the parent task. This would solve the parent task by removing log entries from cu_changes.

Todo (mostly in order but some can be done before others as desired):

  • Create the new tables
    • Create the schemas for these tables
    • Deploy the new tables to WMF production
  • Update code that writes and deletes to cu_changes to also use the new tables based on a schema comptability config (write new support)
    • The many methods that insert, delete and change cu_changes in Hooks.php - gerrit 876360
    • PopulateCheckUserTable maintenance script - gerrit 877182
    • PurgeOldData maintenance script - gerrit 877182
  • Add combined read and write new support - T341586
  • Add read new support so that the code that queries CheckUser reads from all three tables:
    • CheckUserQueryBuilder - gerrit 881420
    • LogFormatter-like class for cu_private_event rows (T341688)
    • CheckUser
      • Get edits pager
      • Get IPs pager
      • Get users pager
    • CheckUser API (T341827)
    • Investigate (T329189)
    • Temporary account IP reveal
  • Create a maintenance script to move log entries from cu_changes to cu_private_event, while keeping the moved entries in cu_changes with cuc_only_for_read_old set to 1 - gerrit 879919 (merged) gerrit 886482
  • Enable write new (T330158)
    • Change the value in extension.json
    • Do this on WMF wikis
      • Double check the new tables are not included any DB dumps - Ladsgroup probably the one to contact for this
  • Move log entries in cu_changes to cu_private_event (first maintenance script)
    • Add this to update.php while ensuring that it is run before any column removal but after the creation of cu_private_event and cu_log_event
    • Have this maintenance script run on WMF wikis - No longer needed as data on WMF wikis has been purged that needed the move
  • Enable read new
    • Change the value in extension.json
    • Do this on WMF wikis
  • Stop writing old
    • In extension.json - T366505
    • Have this changed on WMF wikis - T360685
  • Delete cu_changes rows with cuc_only_for_read_old set to 1 - T341830
    • Create a maintenance script for this
    • Add this maintenance script to update.php while ensuring it it run before column removal but after the moving of log entries
    • Run this maintenance script on WMF wikis (T366781)
  • Remove wgEventTablesMigrationStage config, inferring the value as SCHEMA_COMPAT_NEW in places that it was used (T366546)
  • Remove columns related only log types in cu_changes along with cuc_only_for_read_old (T366782)
    • Remove these columns from WMF production (T370903). This should also implicitly run optimise cu_changes for WMF wikis (needed because on enwiki it should be a ~20% drop and on loginwiki it should be a 99.9% drop in row count).

Details

SubjectRepoBranchLines /-
mediawiki/extensions/AbuseFiltermaster 2 -0
mediawiki/extensions/AbuseFiltermaster 90 -16
mediawiki/extensions/CheckUsermaster 3 -3
mediawiki/extensions/CheckUsermaster 228 -0
mediawiki/extensions/Flowmaster 8 -0
operations/mediawiki-configmaster 5 -0
mediawiki/extensions/CheckUsermaster 2 -2
mediawiki/extensions/CheckUsermaster 350 -1
mediawiki/extensions/CheckUsermaster 3 -2
mediawiki/extensions/CheckUsermaster 213 -28
mediawiki/extensions/CheckUsermaster 577 -215
mediawiki/extensions/CheckUsermaster 619 -28
mediawiki/extensions/AbuseFiltermaster 4 -1
mediawiki/extensions/CheckUsermaster 428 -48
mediawiki/extensions/CheckUsermaster 512 -0
Show related patches Customize query in gerrit

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
ResolvedTchanders
Resolvedkostajh
Resolvedtstarling
DeclinedBUG REPORTNone
OpenNone
OpenNone
ResolvedDreamy_Jazz
ResolvedFeatureDreamy_Jazz
ResolvedMilimetric
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedPRODUCTION ERRORpmiazga
ResolvedDreamy_Jazz
DeclinedNone
DeclinedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Dreamy_Jazz updated the task description. (Show Details)
Dreamy_Jazz updated the task description. (Show Details)

This is now done! Thanks to all for the code review over the last few years.

Change #1057221 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[analytics/refinery@master] Don't select cuc_actiontext from cu_changes for sqoop

https://gerrit.wikimedia.org/r/1057221