Page MenuHomePhabricator

Update Gather DB schema for flagging backend
Closed, ResolvedPublic

Description

-- gl_perm=HIDDEN -> gl_perm=PRIVATE, gl_perm_override=HIDDEN
UPDATE gather_list SET gl_perm = 1, gl_perm_override = 1 WHERE gl_perm = 2;
  • run mwscript sql.php --wiki=<wiki> gather_list_flag.sql
  • run mwscript sql.php --wiki=<wiki> add-gl_list-flag-columns.sql

add-gl_list-flag-columns.sql includes an UPDATE command but it's indexed and does not match many rows (only hidden lists; currently 100 matches on enwiki) so there should be no need to batch it.

Post-deploy cleanup of the fake private settings: UPDATE gather_list SET gl_perm = 0 WHERE gl_perm_override = 2;

Event Timeline

Tgr claimed this task.
Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added projects: Schema-change, Gather.
Tgr added subscribers: Jdlrobson, Tgr.

Nothing fancy:

add-gl_list-flag-columns.sql includes an UPDATE command but it's indexed and does not match many rows (only hidden lists; currently 100 matches on enwiki) so there should be no need to batch it.

Matanya added a project: DBA.
Matanya added a subscriber: jcrespo.

In order to apply a schema change, we also need:

  • The wikis where this will be applied (I assume all where the Gather extension is, but it should be explicitly done/point to a *.list of wikis to avoid mistakes)
  • Since when this can be rolled in (if it depends on a particular code deployment)
  • If the new schema has been checked with the application version currently deployed on WMF and it is fully compatible (for example, after the UPDATE eliminates the '2' value but the application does not know about the new column).

While this questions may sound silly, we require confirmation to avoid errors.

As an informative note, please note that using mwscript is not usually a good way to roll changes in our production, as it may involve a lot of locking even for trivial changes.

  • The wikis where this will be applied (I assume all where the Gather extension is, but it should be explicitly done/point to a *.list of wikis to avoid mistakes)

The gather wikis, per wmgUseGather.

  • Since when this can be rolled in (if it depends on a particular code deployment)

Any time.

  • If the new schema has been checked with the application version currently deployed on WMF and it is fully compatible (for example, after the UPDATE eliminates the '2' value but the application does not know about the new column).

Hm, did not consider that. It would not break the code but hidden lists would become visible for a while. Probably cleaner to have the new code handle gl_perm=2 and do the update some time later.

As an informative note, please note that using mwscript is not usually a good way to roll changes in our production, as it may involve a lot of locking even for trivial changes.

What's wrong with it? I thought it just resolves script names to the right subdirectory. Also, it is still recommended by all documentation (e.g. (1)).

I guess the easiest would be to split the update in two parts: run UPDATE gather_list SET gl_perm_override = 1 WHERE gl_perm = 2 as part of the schema change, then deploy, then run UPDATE gather_list SET gl_perm = 0, gl_perm_override = 1 WHERE gl_perm = 2. Any operation changing gl_perm_override (unhide/approval) between the two DB updates would be undone, but that's not a big deal. We would have to change the patch so that gl_perm = 2 is temporarily accepted.

Even easier, just use UPDATE gather_list SET gl_perm = 1, gl_perm_override = 1 WHERE gl_perm = 2 the first time. That pretends hidden lists are private, which is a valid configuration in both old and new code, with mostly similar effect. The list owner sees a slightly different message, but owners of spam lists won't care.

What's wrong with it? I thought it just resolves script names to the right subdirectory. Also, it is still recommended by all documentation (e.g. (1)).

Yeah, docs are old and probably from the time when there was not a dedicated DBA. I will try to update the related documentation and at least mark it as deprecated. Some updated docs can be found at: https://wikitech.wikimedia.org/wiki/MariaDB#Schema_Changes

Good news is that most changes require no locking at all, but a bit of extra preparation and time. This allows extra QA and testing of deployments.

To give you a summary of one of the issues:

Metadata locking was introduced in 5.5 to assure ACID properties. A DDL change/FLUSH TABLES an similar statements require acquiring some exclusive locking during a fraction of time -even if the actual DDL is lock-free, and it has to wait for all current queries to finish (this makes sure queries do not read the wrong version of the table as DDLs do not support multi-versioning). However, if there is a long running transaction, all subsequent queries will be enqueued and eventually fail after innodb_lock_wait_timout.

This kind of things among others, while very unlikely in a small installation, have to be weighted in when doing schema changes in our production. Also, unlike in most installations -these changes are not atomic and instant- updating large tables may take hours to take effect, and done days after/before code deployments. Good news is that the DBAs will take care of this and assure nothings goes wrong- but we need your cooperation!

@Tgr:

Even easier, just use UPDATE gather_list SET gl_perm = 1, gl_perm_override = 1 WHERE gl_perm = 2 the first time. That pretends hidden lists are private, which is a valid configuration in both old and new code, with mostly similar effect. The list owner sees a slightly different message, but owners of spam lists won't care.

Please note that this will affect also new private collections ("INSERTs", not only "UPDATEs")- I assume they will be created public by default - that will not be carried out to the new code.

One alternative I would suggest is:

  1. Perform the ALTER TABLEs with the current code (not the update), let the new columns get de-synced
  2. making sure we run the original UPDATE on each batch of wikis, just before the new code deployment (updates are less problematic than ALTER TABLEs).

I doubt locking would be a concern for Gather; we are talking about a table with 5000 rows and a couple hundred write queries a day.

That said, splitting the ALTER and the UPDATE (and possibly waiting arbitrarily long between the two) should not be a problem.

Please note that this will affect also new private collections ("INSERTs", not only "UPDATEs")- I assume they will be created public by default - that will not be carried out to the new code.

Collections created after we run the update should be fine. Collections hidden after that would be problematic, but only WMF staff can hide for now, so we can simply avoid doing that. Should not be a problem if just for a few days.

@Tgr:

I doubt locking would be a concern for Gather; we are talking about a table with 5000 rows and a couple hundred write queries a day.

Please do not underestimate the impact of a change- even if a single table may be small and with little traffic- replication is serialized, and even a 1 second change can lead to very bad consequences on the slaves, specially for busy servers like enwiki. You will want a DBA available to prevent and mitigate those problems, if they appear. Again, we should be careful not because of your change, which is small and localized, but because of the rest of the environment. I am here so you do not have to worry about that :-).

Changes deployed:

  • Wikis affected:
testwiki
test2wiki
enwiki
enwikivoyage
hewiki
  • Changes applied:
ALTER TABLE gather_list
    ADD COLUMN gl_perm_override TINYINT UNSIGNED NOT NULL DEFAULT 0 AFTER gl_perm,
    ADD COLUMN gl_needs_review TINYINT UNSIGNED NOT NULL DEFAULT 0 AFTER gl_perm_override,
    ADD COLUMN gl_flag_count INT UNSIGNED NOT NULL DEFAULT 0 AFTER gl_needs_review,
    DROP INDEX gl_perm_updated,
    ADD INDEX gl_visibility_updated (gl_perm, gl_perm_override, gl_flag_count, gl_needs_review, gl_updated, gl_id);

CREATE TABLE gather_list_flag (
    glf_gl_id INT UNSIGNED NOT NULL,
    glf_user_id INT UNSIGNED NOT NULL DEFAULT 0,
    glf_user_ip VARBINARY(40) NOT NULL DEFAULT '',
    glf_reviewed BOOL NOT NULL DEFAULT false,
    UNIQUE INDEX glf_list_user_ip (glf_gl_id, glf_user_id, glf_user_ip),
    INDEX glf_list_reviewed (glf_gl_id, glf_reviewed)
);

UPDATE gather_list
SET gl_perm_override = 1, gl_perm = 1
WHERE gl_perm = 2;
  • Some comments:
    • I will let you own post-deployment changes, like the second UPDATE proposed
    • The tables lack of a primary key. In the future I would like to make a primary key a requirement for 2 on new or altered tables. You have very good PK candidates (unique keys). While they are not needed and required by design, at implementation both InnoDB (our main db engine) and row based replication (probably to be deployed in the future) performance benefits from it, with the only difference of forcing the name of the previous UNIQUE key to be PRIMARY. I would _beg_ to consider that change in a future iteration before the tables grow too big.
    • Packing alters in a single command helps schema changes deployment (makes them faster) and provides atomicy. Highly recommended.

Ran UPDATE gather_list SET gl_perm = 0 where gl_perm_override = 1 on Gather wikis (hewiki: 2 rows affected, enwiki: 124 rows). Verified that hidden collections are not visible.