Page MenuHomePhabricator

Discrepancies with logging table on different wikis
Closed, ResolvedPublic

Assigned To
Authored By
bzimport
Aug 4 2014, 9:39 PM
Referenced Files
None
Tokens
"Orange Medal" token, awarded by jcrespo."Burninate" token, awarded by Liuxinyu970226."Burninate" token, awarded by MusikAnimal.

Description

Author: metatron

Description:
Queries on logging-table perform different on different wikis.
Atm there are different key definitions on eg. enwiki and dewiki

  • enwiki has a key log_user_text_time(ok to have), dewiki doesn't
  • dewiki has a key log_user_type_time(important to have), enwiki doesn't
  • same for log_title_time / log_page_id_time
enwiki | CREATE TABLE `logging`
...
PRIMARY KEY (`log_id`), 
  KEY `type_time` (`log_type`,`log_timestamp`), 
  KEY `user_time` (`log_user`,`log_timestamp`), 
  KEY `page_time` (`log_namespace`,`log_title`,`log_timestamp`),
  KEY `times` (`log_timestamp`), 
  KEY `log_page_id_time` (`log_page`,`log_timestamp`), 
  KEY `log_user_text_time` (`log_user_text`(16),`log_timestamp`), 
  KEY `log_title_time` (`log_title`(16),`log_timestamp`) 
) ENGINE=TokuDB AUTO_INCREMENT=58015836 DEFAULT CHARSET=binary `compression`='tokudb_zlib'
dewiki | CREATE TABLE `logging`
...
PRIMARY KEY (`log_id`),
  KEY `type_time` (`log_type`,`log_timestamp`),
  KEY `user_time` (`log_user`,`log_timestamp`),
  KEY `page_time` (`log_namespace`,`log_title`,`log_timestamp`),
  KEY `times` (`log_timestamp`),
  KEY `log_user_type_time` (`log_user`,`log_type`,`log_timestamp`),
  KEY `log_page_id_time` (`log_page`,`log_timestamp`)
) ENGINE=TokuDB AUTO_INCREMENT=62684641 DEFAULT CHARSET=binary `compression`='tokudb_zlib'

Schema change progress

  • wikitech
    • labswiki
    • labtestwiki

Details

Reference
bz69127

Related Objects

StatusSubtypeAssignedTask
ResolvedMarostegui
ResolvedLadsgroup
ResolvedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedReedy
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
Resolved Kormat
ResolvedMarostegui
Resolved Kormat
ResolvedMarostegui
Resolved Kormat
ResolvedMarostegui

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:36 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz69127.
chasemp subscribed.

For the record, this is the current status of the discrepancies of this table (based on tables.sql: https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql

s1 (enwiki )and s2(bgwiki bgwiktionary cswiki enwikiquote enwiktionary eowiki fiwiki idwiki itwiki nlwiki nowiki plwiki ptwiki svwiki thwiki trwiki zhwiki), s4 (commonswiki):
missing:

KEY `type_action` (`log_type`,`log_action`,`log_timestamp`),
KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

Extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

s5:
dewiki all the hosts are missing:

KEY `type_action` (`log_type`,`log_action`,`log_timestamp`),
KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

And some hosts have 1 or 2 extra indexes:

db1092.eqiad.wmnet
  KEY `log_title_time` (`log_title`(16),`log_timestamp`)
db1063.eqiad.wmnet
  KEY `log_title_time` (`log_title`(16),`log_timestamp`),
  KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

wikidatawiki is missing:

KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

And have the two extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

s6 (frwiki, jawiki, ruwiki):
They are missing:

KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

And they have the two extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

s7 (arwiki cawiki eswiki fawiki frwiktionary hewiki huwiki kowiki metawiki rowiki ukwiki viwiki):
Missing:

KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

Has the two extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

I have not checked s3.
So why do we have those two extra indexes everywhere? I haven't found anything on gerrit about them - found this though: T68961

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

Should we unify this table to what tables.sql says?
If so, we'd need to first check if removing those indexes will have any impact and I suspect it would as otherwise why would they be there?
If they are important why aren't them on tables.sql?
Should we add the other two missing indexes where they are indeed missing?

FYI, the lack of log indexes on some wikis is causing issues for XTools. See, for example, T171264#3468493.

kaldari raised the priority of this task from Low to Medium.Jul 25 2017, 12:22 AM

Yeah, the log_title_time index seems especially helpful when doing things like looking up a block log. You could use log_page instead of log_title, using the ID of the user's userpage, but that doesn't work if they haven't created a userpage :( Also we'd have to look up the page ID first. If there's another, quicker cross-wiki way of checking the block log, please enlighten me :)

jcrespo renamed this task from Database upgrade MariaDB 10: Discrepancies with logging table on different wikis to Discrepancies with logging table on different wikis.Aug 7 2017, 8:19 AM

Nothing to do with MariaDB 10, I assume at some point was a blocker. This is probably a multi-month maintenance such as T132416.

As far as I can see, mediawiki doesn't define log_title_time nor log_title_type_time ( see https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql ). Many hosts lack log_user_text_type_time (log_user_text, log_type, log_timestamp), and some tables have some string-containing keys only indexing the first 16 bytes/characters.

It is important to separate production from wikireplicas- wikireplicas can have additional indexes, production should not- or we should patch mediawiki first.

Marostegui moved this task from Pending comment to In progress on the DBA board.

I am going to try to work on this, as this has bitten us already a few times already, the most recent time on a global user rename.
This is in a not very good shape (T71127#3335238), I will try to combine this with T51199

After analyzing the indexes on T217397 and testing a few hosts (T217397#4997997 and T217397#5010633) with the exact table definition on tables.sql, we are going to unify logging table based on tables.sql indexes.

The following hosts on s1 eqiad have the indexes unified with tables.sql (T217397#5010633)

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • dbstore1001
  • db1124
  • db1119
  • db1118
  • db1106
  • db1105
  • db1099
  • db1089
  • db1083
  • db1080
  • db1067

The following hosts on s4 eqiad have the indexes unified with tables.sql (T217397#5010633)

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1125
  • db1121
  • db1103
  • db1102
  • db1097
  • db1091
  • db1084
  • db1081
  • db1068

The following hosts on s5 eqiad have the indexes unified with tables.sql (T217397#5010633)

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070

Mentioned in SAL (#wikimedia-operations) [2019-03-12T07:18:34Z] <marostegui> Deploy schema change on db2052 (s5 codfw master), this will generate lag on codfw T71127 T51199

s6 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1005
  • dbstore1001
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061

s8 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1005
  • db1124
  • db1116
  • db1109
  • db1104
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071

s7 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • db1125
  • db1116
  • db1101
  • db1098
  • db1094
  • db1090
  • db1086
  • db1079
  • db1062

s2 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1125
  • db1122
  • db1105
  • db1103
  • db1095
  • db1090
  • db1076
  • db1074
  • db1066

s3 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1124
  • db1123
  • db1095
  • db1075
  • db1077
  • db1078
Marostegui updated the task description. (Show Details)

All done! One less drift between production and tables.sql