User Details
- User Since
- Oct 7 2014, 2:34 AM (529 w, 3 d)
- Roles
- Disabled
- LDAP User
- Springle
- MediaWiki User
- Unknown
Aug 8 2018
May 21 2018
Messing with bzip2, pbzip2, lbzip2 on snapshot001, looking at timing only (not archive size or non-default config).
Apr 13 2018
Apr 10 2018
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCZwhGWhhv 9QdjhhShbLdSZSV349oFxPH73CfvI0jRsQFXsQIlPQaSeKcFqw kjhUoxvfgCw3YWoExHTT6jxHUxrOswI6ZVPeicHNBQ4kiRRY4uKE0xpqbdnkbLRSNWyru8zG1aB/uxpkhsQhwnUZ9fpGtDkXzX1In8NZ7X9jMQB6yrHFxqK/549WELGnpscL79lX7uKM2Ri/ v61th7kuDyn6VjsIMSLdt46dKoW9WgQ2UgkjEh67HOZd1FYt4V OaQcNr2JtHj7nSI6YsXx9TQnBrQVqWQXk63AFNxw4uD7xFVByc4FIqefIYjHqHANRWpRmaNOcj6LaBTqXZUBSmtYRiLkXUhqhr1Tf1NiE75UjGKhknucpywXTYI02HaTdEcdxfN4C9guI ojxwUKrIMEk9Wz3qcYzyN0QZmCL/6EcRxjEUzYDpEt0tMBRsRqE5Qp0TLPuDsK5trY1rtdzy/HckqmSik9N1p2WQ941SWs2EEiFji1jiCM4N8gwy1r6mf9xo5LWRVY/LtNYbCf/2EfW3mjreP9MaOGI vedcS8I4sd6O3VP8WPpXZtoBU1 EKLhEHvfp/E9qYYr6iWIltCFySi67fWlv83cUNezJ6uMrDR g8ANkFJKEWSHJzdVyrtf2fiwNNyIPrkEawHAcKHsZsVGdzkP9Xr8eBb7Q== sean@laptop
Apr 5 2018
Updated to a non-wmf email in phabricator profile settings.
Apr 4 2018
Nov 2 2016
Aug 28 2015
Aug 24 2015
Regarding implementing this on replicas: nothing special is needed. Change on the master will propagate (but rate-limit everything, to avoid replag).
A white list is fine. No real difference from DBA perspective, so 1.
@mforns, yes, feel free.
Aug 13 2015
Aug 7 2015
It's correct in that Mediawiki still supports MySQL 5.0.2 [1] with a default SQL_MODE setting [2]. Previous discussions about enforcing strict mode havn't gone far, mainly due to questions about MW extensions.
Jul 30 2015
Just to clarify, I don't think we've seen actual OOM killer on s[1-7], right? The only front-line production concern is swapping on s3 db1035?
Also, all bot driven, eg: tide543.microsoft.com
Continuing on db1042 right now and queries regularly hitting 5min limit. The LIMIT 10405000,501 is most of the problem here; shouldn't be possible to generate such a search.
Jul 28 2015
At Yuvi's request on IRC I added 'labsdbadmin'@'10.64.37.7' with the same permissions/password as @jcrespo added for 'labsdbadmin'@'10.64.37.6', since the catastrophic failure of the latter.
Jul 27 2015
1 to the provisioning.
Jul 24 2015
@jcrespo, no, I did not out-of-band change or use skip counter. I found the machine exactly as you described on IRC, and only did research by dumping logs to get the query examples shown in ticket description.
Jul 23 2015
Jul 20 2015
All s1-7 slaves are now 10.0.
@jcrespo, ouch, so maybe something is broken in our physical backup process (was it xtrabackup?) that required the SQL_SLAVE_SKIP_COUNTER after reinstall on 2015-06-29? Scary...
Jul 15 2015
For now, I will depool db1022.
So, I need to get some more information from Jaime about what occurred on the weekend (he is unwell and on leave since then), but looking over the logs I see:
Jul 13 2015
Also, yes, time to plan another batch.
Sounds good.
Jul 10 2015
Jul 8 2015
Jul 6 2015
A bunch of "unauthenticated user" in processlist still makes me suspect the thread pool, since that symptom has been seen on prod slaves with thread_pool_size=16 (but not the immediate all-connections-fail, which is indeed odd).
Wasn't sure if you wanted to change that process :)
Jul 4 2015
Doing this to most production DBs seems straight forward. Pain points, due just to complexity, will be on M[1-4].
Jul 3 2015
(4) == EINTR on connect. Presumably the max_connections you observed, which in turn possibly something to do with:
Jul 2 2015
Tried a restart of dbstore2002, but s7 replication behavior was unchanged: Yes/Yes for replication threads, master exec position advancing, yet no changes appearing.
Jun 19 2015
Jun 18 2015
Oops, sorry, this was fixed a while back.
Schema change done on x1-master flowdb.
Jun 17 2015
Jun 15 2015
Jun 11 2015
We'll use something other than !log, to maintain -operations supremacy. This is more about quickly posting notes on database hosts -- and tendril is just an obvious place, for us -- since db maintenance tasks tend to be slow, have multiple stages, and likely to be handed over to the next shift.
Jun 10 2015
So, to be clear, this isn't intended to:
This would need a bit of downtime to migrate data, but in the order of hours, not days. Not especially difficult, and would slightly simplify the sanitarium/labs setup (though most of the complexity there is not the private wikis, which are easy to blanket-filter). 1
Jun 8 2015
If the box is taken down, keep the list in the loop.
Jun 7 2015
I did indeed do a similar fix to get s6 going, and then sinned slightly by setting slave_exec_mode=idempotent to keep it alive for the weekend.
Jun 6 2015
- Move vslow/dump to db1037 with load 0
- Reinstall db1022 as trusty (or try out Jessie? we should do this eventually, and wmf-mariadb10 package is fine on Jessie)
- Check which s6 slaves have small ibdata global tablespaces. If they're all still large (from pre file-per-table days), also consider a fresh s6 dump/reload on db1022 from which we can eventually reclone the others.
Jun 5 2015
This can go out with T101460 at any time.
The schema change only affects a few wikis[1] and small tables, so imo it can go out at any time.
Jun 4 2015
While a proper config ini fix would be both welcome and necessary... have discussed with @Joe the possibility of just patching hhvm's hardcoded mysqlExtension::ConnectTimeout to 3000ms in our next build.
Upgrades should use the phadmin user. Pointed @mmodell in this direction via IRC.
Jun 3 2015
Nice :-) I see it can be easily aborted too, which was my only (vague) concern.
Jun 1 2015
Sounds good? What are the cons?
May 28 2015
Indeed, I reported S7 issues to analytics@ a couple weeks ago (they started after the full /tmp problems in late April), and have been jumping over some hurdles to rebuild S7 on analytics-store while the box is under load. Context:
May 27 2015
There isn't any 60s cut-off being applied server-side. There is only a 300s limit for wikiuser (but not wikiadmin).
May 25 2015
May 22 2015
May 18 2015
If we change this, check mysqldump command for m3 backups on dbstore1001.
This is made more complicated by the switch to HP boxes in CODFW. May be related to, or influenced by, T97998.
Smaller fields are still better for performance, so I vote for a conservative 300, unless 400 is demonstrably needed.
May 14 2015
Not an overreaction. But logical backups run on dbstore1001, not on the production shard itself, and the "dumps"[1], if that's what you saw, should only affect the s7 "vslow" slave.
May 13 2015
Note that services should connect to a CNAME, m5-master.eqiad.wmnet. Check if we need any special network/vlan rules (think not, but since it is labs-related, ask @faidon)
- add an m5-master.eqiad.wmnet CNAME to dns
May 12 2015
Probably we should setup an M5 cluster dedicated to databases for labs services, and also move pdns/designate over from M1 at the same time. A single EQIAD R510 master (db1009 is available) replicating to a CODFW slave.