flow_subscription, flow_topic_list and flow_tree_node don't have primary keys, but each table has a unique key that can easily be converted to a primary key.
Description
Details
Subject | Repo | Branch | Lines /- | |
---|---|---|---|---|
Add primary keys to the remaining Flow tables | mediawiki/extensions/Flow | master | 13 -8 |
Status | Subtype | Assigned | Task | |
---|---|---|---|---|
· · · | ||||
Stalled | None | T108255 Enable MariaDB/MySQL's Strict Mode | ||
Resolved | LSobanski | T17441 Some tables lack unique or primary keys, may allow confusing duplicate data | ||
Resolved | Marostegui | T149819 Add primary keys to remaining Flow tables | ||
· · · |
Event Timeline
Change 319358 had a related patch set uploaded (by Catrope):
Add primary keys to the remaining Flow tables
Thank you, this is actually important because even if internally, InnoDB treats those as PKs, from the SQL point of view that is not true, and that can lead to query issues.
I will wait for the code deployment to actually start working on the schema change.
Matt pointed out that the flow_subscription table is unused and should just be dropped entirely (I checked and it's empty on x1), so I took that one out of the patch. I'll submit a separate schema change to drop that table.
I will take this ticket as I am about to finish another schema change tomorrow. However due to the xmas times, this will likely will be started in January
Jaime pointed out that the tables normally quite small and he's totally right:
root@db1031:/srv/sqldata/flowdb# ls -lh flow_topic_list.ibd -rw-rw---- 1 mysql mysql 23M Dec 14 14:10 flow_topic_list.ibd root@db1031:/srv/sqldata/flowdb# ls -lh flow_tree_node.ibd -rw-rw---- 1 mysql mysql 120M Dec 14 14:25 flow_tree_node.ibd
So I might just execute this on the master and let it replicate.
I think by next week we can start deploying this change.
As the tables are rather small, it is probably safer to deploy it on the master (db1031) and let it replicate.
This would be it:
alter table flow_topic_list drop key flow_topic_list_pk, ADD PRIMARY KEY (topic_list_id, topic_id); show create table flow_topic_list\G *************************** 1. row *************************** Table: flow_topic_list Create Table: CREATE TABLE `flow_topic_list` ( `topic_list_id` binary(11) NOT NULL, `topic_id` binary(11) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0', PRIMARY KEY (`topic_list_id`,`topic_id`), KEY `flow_topic_list_topic_id` (`topic_id`) ) ENGINE=InnoDB DEFAULT CHARSET=binary 1 row in set (0.00 sec) alter table flow_tree_node drop key flow_tree_node_pk, ADD PRIMARY KEY (tree_ancestor_id, tree_descendant_id); show create table flow_tree_node\G *************************** 1. row *************************** Table: flow_tree_node Create Table: CREATE TABLE `flow_tree_node` ( `tree_ancestor_id` binary(11) NOT NULL, `tree_descendant_id` binary(11) NOT NULL, `tree_depth` smallint(6) NOT NULL, PRIMARY KEY (`tree_ancestor_id`,`tree_descendant_id`), UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`) ) ENGINE=InnoDB DEFAULT CHARSET=binary
The table will be blocked until the ALTER is finished - meaning no writes will go thru.
The table is rather small so I don't think it will take long to finish the ALTER.
Once the alter goes thru on the master it will replicate on the slaves and it might cause some lag (again, not too big as the tables are small), but during that lag, information read from the slaves might be slightly outdated.
Thanks @Marostegui.
How can I explain that? "Some Flow databases are going to be changed, that may cause some slowdowns [ add date]"? I'm asking, because I'm currently writing the next team newsletter and I prefer to inform people is a change is going to affect them.
I just did a test to give you more detailed data about how long the change will take and the impact.
On the smallest table (20MB) the alter took: 0.59 second
On the biggest table (120MB) the alter took: 4.43 seconds
And those numbers are from a virtual machine which is way less powerful than our production hosts :-)
So the impact will be almost 0, so you might want to mention that at some point during next week we will do a small maintenance on the flow database but there is no impact expected.
Would that be enough for you and the team?
OK, so no visible impact - cool!
However, I'll add a quick note to show people that we still maintain Flow (and as a safety net if something goes wrong; that's not mean I don't trust anyone skills :))
When do you plan to perform that change?
Not sure yet, probably early next week. I can update the ticket on Monday or so with some more detailed timing if that works for you.
It will be done some day early in UTC mornings.
I plan to send my newsletter on Monday morning. I don't want to bother you, so I can cheat and just say "this week".
I have set up the flowdb (with the current data) on a test environment to make sure that adding the PK on the master and then inserting values after it won't break replication (because of inconsistencies).
It all went fine, so I think it is safe to go ahead and add them on the master.
I have taken a backup of flowdb - db1031:/srv/tmp/flowdb_23_jan.sql and going to start altering the tables now.
Mentioned in SAL (#wikimedia-operations) [2017-01-23T15:38:05Z] <marostegui> Alter tables: flow_topic_list and flow_tree_node on db1031 (x1 master) - T149819
Both tables have been successfully altered
root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G *************************** 1. row *************************** Table: flow_topic_list Create Table: CREATE TABLE `flow_topic_list` ( `topic_list_id` binary(11) NOT NULL, `topic_id` binary(11) DEFAULT NULL, UNIQUE KEY `flow_topic_list_pk` (`topic_list_id`,`topic_id`), KEY `flow_topic_list_topic_id` (`topic_id`) ) ENGINE=InnoDB DEFAULT CHARSET=binary 1 row in set (0.00 sec) root@PRODUCTION x1[flowdb]> alter table flow_topic_list drop key flow_topic_list_pk, ADD PRIMARY KEY (topic_list_id, topic_id); Query OK, 100371 rows affected (1.69 sec) Records: 100371 Duplicates: 0 Warnings: 0 root@PRODUCTION x1[flowdb]> show create table flow_topic_list\G *************************** 1. row *************************** Table: flow_topic_list Create Table: CREATE TABLE `flow_topic_list` ( `topic_list_id` binary(11) NOT NULL, `topic_id` binary(11) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0', PRIMARY KEY (`topic_list_id`,`topic_id`), KEY `flow_topic_list_topic_id` (`topic_id`) ) ENGINE=InnoDB DEFAULT CHARSET=binary 1 row in set (0.00 sec) root@PRODUCTION x1[flowdb]> show create table flow_tree_node\G *************************** 1. row *************************** Table: flow_tree_node Create Table: CREATE TABLE `flow_tree_node` ( `tree_ancestor_id` binary(11) NOT NULL, `tree_descendant_id` binary(11) NOT NULL, `tree_depth` smallint(6) NOT NULL, UNIQUE KEY `flow_tree_node_pk` (`tree_ancestor_id`,`tree_descendant_id`), UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`) ) ENGINE=InnoDB DEFAULT CHARSET=binary 1 row in set (0.00 sec) root@PRODUCTION x1[flowdb]> alter table flow_tree_node drop key flow_tree_node_pk, ADD PRIMARY KEY (tree_ancestor_id, tree_descendant_id); Query OK, 0 rows affected (8.62 sec) Records: 0 Duplicates: 0 Warnings: 0 root@PRODUCTION x1[flowdb]> show create table flow_tree_node\G *************************** 1. row *************************** Table: flow_tree_node Create Table: CREATE TABLE `flow_tree_node` ( `tree_ancestor_id` binary(11) NOT NULL, `tree_descendant_id` binary(11) NOT NULL, `tree_depth` smallint(6) NOT NULL, PRIMARY KEY (`tree_ancestor_id`,`tree_descendant_id`), UNIQUE KEY `flow_tree_constraint` (`tree_descendant_id`,`tree_depth`) ) ENGINE=InnoDB DEFAULT CHARSET=binary 1 row in set (0.00 sec)