Page MenuHomePhabricator

Move wb_terms data in cloud replicas to wb_terms_no_longer_updated
Closed, ResolvedPublic

Description

In order to ease the transition period from the new to old wikidata terms storage tables we want to keep the old and no updated data for tools to choose to continue using if they want and if that would not lead to any undesired side effects.
Old, non updated data will continue to exist in a view called wb_terms_no_longer_updated, and wb_terms will be emptied so that other tools notice that the old data source should no longer be used.

Steps

Thursday 26th

  • Make another view of wb_terms on cloud replicas called "wb_terms_no_longer_updated"

April 6th

  • Rename the regular wb_terms table to wb_terms_no_longer_updated on the cloud replicas
  • Create an empty table called wb_terms with the same structure as the old table but with no data
  • Reconfigure the wb_terms and wb_terms_no_longer_updated views to simply be fullviews in the maintain-views script

Event Timeline

@Addshore the last step on Monday I guess you meant create an empty table?
Also when do you want the table rename to happen?

FWIW, I've updated the description to the date of the second step to match the one announce to the community.

Change 583693 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: Add wb_terms_no_longer_updated view name

https://gerrit.wikimedia.org/r/583693

@Addshore the last step on Monday I guess you meant create an empty table?

Yes, although now sure if that is actually a CREATE or another view of some kinds, as I don't know all of the cloudy things :)

Also when do you want the table rename to happen?

I guess the table rename is actually different to changing the views?
If so, I don't think we actually need to rename the table at all? just fiddle with the views?

Change 583693 merged by Bstorm:
[operations/puppet@production] wikireplicas: Add wb_terms_no_longer_updated view name

https://gerrit.wikimedia.org/r/583693

So as far as I understand and from https://gerrit.wikimedia.org/r/583693
The view wb_terms_no_longer_updated will point to wb_terms

The current wb_terms view points to wb_terms. I thought you wanted to, at some point, truncate wb_terms. If that happens and you want people to still be able to query old wb_terms data, we'd need to:

  • Rename wb_terms table to maybe: wb_terms_no_longer_updated and re-point the view wb_terms_no_longer_updated to that "new" table.
  • Create an empty wb_terms table with the same structure. wb_terms view will keep pointing to this now empty table.

You choose! :)

I would prefer the table rename be done, then the views can be adjusted so that wb_terms and wb_terms_no_longer_updated both are in the "fullviews" key in the yaml. That will make it clear there is no filtering or other whatnot going on.

In the patch I put up both wb_terms and wb_terms_no_longer_updated will point to wb_terms until the rename and empty table.

Generally, anything that is "select *" is under the fullviews structure in our config as long as the table name is the same. I think that's healthy. That yaml file is hard enough to understand as is.

Proposing a change to the description to match that....

Bstorm triaged this task as Medium priority.Mar 26 2020, 5:53 PM
Bstorm updated the task description. (Show Details)

Look good now @Addshore and @Marostegui ?

That sounds good to me
Thank you :)

labsdb1011 is fighting with me. Lots of wikidata queries going on there. I'll start killin' 🔪

Done with today's part. Openstack upgrades seem to have affected some bots, or I'd put it in SAL.

Bstorm updated the task description. (Show Details)
Bstorm subscribed.

@Addshore maybe this is not the right task to ask for it but...is Analytics aware of this change? They use labsdb1012 to sqoop data the first days of the month.

So as far as I understand and from https://gerrit.wikimedia.org/r/583693
The view wb_terms_no_longer_updated will point to wb_terms

As I understand it yes

The current wb_terms view points to wb_terms. I thought you wanted to, at some point, truncate wb_terms. If that happens and you want people to still be able to query old wb_terms data, we'd need to:

  • Rename wb_terms table to maybe: wb_terms_no_longer_updated and re-point the view wb_terms_no_longer_updated to that "new" table.
  • Create an empty wb_terms table with the same structure. wb_terms view will keep pointing to this now empty table.

wb_terms in the cloud replicas for wikidatawiki_p is a view though? Or did I get that wrong?
If it is a view then we just remove that view and create a new table in wikidatawiki_p called wb_terms that is empty?
If it is not a view then we must do some more things.

I would prefer the table rename be done, then the views can be adjusted so that wb_terms and wb_terms_no_longer_updated both are in the "fullviews" key in the yaml. That will make it clear there is no filtering or other whatnot going on.

In the patch I put up both wb_terms and wb_terms_no_longer_updated will point to wb_terms until the rename and empty table.

If this is the desired approach then we can coordinate this on Monday (or chat tomorrow Friday) :)

Look good now @Addshore and @Marostegui ?

Yup!

| wb_terms                   |
| wb_terms_no_longer_updated |
| wbc_entity_usage           |
| wbqc_constraints           |
| wbs_propertypairs          |
| wbt_item_terms             |
| wbt_property_terms         |
| wbt_term_in_lang           |
| wbt_text                   |
| wbt_text_in_lang           |
| wbt_type                   |
 ---------------------------- 
107 rows in set (0.00 sec)

MariaDB [wikidatawiki_p]> select * from wb_terms_no_longer_updated limit 1;
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text  | term_search_key | term_weight |
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
|      363474 |          50445 | Q50445              | item             | en-gb         | label     | Jungwon-gu |                 |           0 |
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
1 row in set (0.01 sec)
MariaDB [wikidatawiki_p]> select * from wb_terms limit 1;
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text  | term_search_key | term_weight |
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
|      363474 |          50445 | Q50445              | item             | en-gb         | label     | Jungwon-gu |                 |           0 |
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
1 row in set (0.00 sec)

@Addshore maybe this is not the right task to ask for it but...is Analytics aware of this change? They use labsdb1012 to sqoop data the first days of the month.

I believe the sqooping of these tables only happens because of a desire by WMDE
I'll make sure we turn this off on Friday!
Ping @JAllemandou :)

wb_terms in the cloud replicas for wikidatawiki_p is a view though? Or did I get that wrong?
If it is a view then we just remove that view and create a new table in wikidatawiki_p called wb_terms that is empty?
If it is not a view then we must do some more things.

We don't have any real tables in the *_p databases, and we should keep it that way because I think it very unwise to complicate the wikireplicas more than they already are. Only views are in those DBs, and they point at tables in the corresponding "non_p" databases. That's the whole basis of the design at the moment. I also do not wish to create things that exist outside the scripted system re: the views.

Theoretically, many tasks around the views should be do-able by the WMCS on-call without a lot of fuss. The wikidatawiki_p database should just have views in it that roughly match the description in the puppet-controlled yaml file. Otherwise, it'll get broken by a script run one way or another.

I would prefer the table rename be done, then the views can be adjusted so that wb_terms and wb_terms_no_longer_updated both are in the "fullviews" key in the yaml. That will make it clear there is no filtering or other whatnot going on.

In the patch I put up both wb_terms and wb_terms_no_longer_updated will point to wb_terms until the rename and empty table.

If this is the desired approach then we can coordinate this on Monday (or chat tomorrow Friday) :)

Sure!

Let me know how you guys want to proceed with this and when :)

As the labs relplica db tables do not get exposed to users at any level I don't know if we should bother renaming them.
If people agree with that statement then we would just do some view moving today with @Bstorm that would change the current wb_terms view to return 0 results

If we really do want to / have a reason to change the name of the underlying table then I guess we will need to coordinate a view change for the wb_terms_no_longer_updated view with the table rename between @Bstorm and @Marostegui

As the labs relplica db tables do not get exposed to users at any level I don't know if we should bother renaming them.
If people agree with that statement then we would just do some view moving today with @Bstorm that would change the current wb_terms view to return 0 results

If we really do want to / have a reason to change the name of the underlying table then I guess we will need to coordinate a view change for the wb_terms_no_longer_updated view with the table rename between @Bstorm and @Marostegui

Up to you if you want to get the wb_terms table renamed or not (I would prefer to have it renamed so we don't get confused with it or drop it by accident) also we do have to decide what to do with the wb_terms view.

Up to you if you want to get the wb_terms table renamed or not (I would prefer to have it renamed so we don't get confused with it or drop it by accident)

1

also we do have to decide what to do with the wb_terms view.

The plan will be for the wb_terms view to return 0 rows.

So I guess we have to coordinate the following:

  • @Marostegui renames the wb_terms table and create a wb_terms empty table
  • @Bstorm changes wb_terms_no_longer_updated view to point to the renamed version of wb_terms which still contains data.

The wb_terms view remains untouched and pointing to wb_terms empty table.

So I guess we have to coordinate the following:

  • @Marostegui renames the wb_terms table and create a wb_terms empty table
  • @Bstorm changes wb_terms_no_longer_updated view to point to the renamed version of wb_terms which still contains data.

The wb_terms view remains untouched and pointing to wb_terms empty table.

Sounds like a solid plan to me!

ok - let's wait for @Bstorm so we can execute it! :)

Change 586384 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: point wb_terms_no_longer_updated and wb_terms correctly

https://gerrit.wikimedia.org/r/586384

Followed up in the patch. Thank you!

Change 586384 merged by Bstorm:
[operations/puppet@production] wikireplicas: point wb_terms_no_longer_updated and wb_terms correctly

https://gerrit.wikimedia.org/r/586384

Mentioned in SAL (#wikimedia-operations) [2020-04-08T08:46:21Z] <marostegui> Rename wb_terms and recreate views on labsdb1009-labsdb1011 - T248592 T248086

Mentioned in SAL (#wikimedia-operations) [2020-04-08T09:10:45Z] <marostegui> Reload proxies on dbproxy1018 and dbproxy1019 to depool labsdb1011 - T249188 T248592

This is all done. All labsdb hosts show this behaviour:

root@cumin1001:/home/marostegui# for i in wikidatawiki_p testwikidatawiki_p commonswiki_p testcommonswiki_p; do echo $i; mysql.py -hlabsdb1011 $i -e "select * from wb_terms_no_longer_updated limit 1 ";done
wikidatawiki_p
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text  | term_search_key | term_weight |
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
|      363474 |          50445 | Q50445              | item             | en-gb         | label     | Jungwon-gu |                 |           0 |
 ------------- ---------------- --------------------- ------------------ --------------- ----------- ------------ ----------------- ------------- 
testwikidatawiki_p
 ------------- ---------------- --------------------- ------------------ --------------- ------------- ----------- ----------------- ------------- 
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type   | term_text | term_search_key | term_weight |
 ------------- ---------------- --------------------- ------------------ --------------- ------------- ----------- ----------------- ------------- 
|      273371 |              6 | Q6                  | item             | de            | description | zzjrjrtzj | zzjrjrtzj       |       0.139 |
 ------------- ---------------- --------------------- ------------------ --------------- ------------- ----------- ----------------- ------------- 
commonswiki_p
testcommonswiki_p

root@cumin1001:/home/marostegui# for i in wikidatawiki_p testwikidatawiki_p commonswiki_p testcommonswiki_p; do echo $i; mysql.py -hlabsdb1011 $i -e "select * from wb_terms limit 1 ";done
wikidatawiki_p
testwikidatawiki_p
commonswiki_p
testcommonswiki_p