Page MenuHomePhabricator

Use inclusive language in code for private analytics infrastructure
Open, HighPublic

Description

Let's rename as much as possible terms like whitelist, blacklist, master, slave, etc (let's decide on the exact list in the comments below).

For example: https://codesearch.wmcloud.org/analytics/?q=whitelist&i=nope&files=&excludeFiles=node_modules&repos=

Event Timeline

fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Is this a duplicate of T254646 (or one of its subtasks)?

nshahquinn-wmf renamed this task from Use inclusive language to Use inclusive language in code for private analytics infrastructure.Oct 27 2021, 10:35 PM

Hi @Milimetric ! I'm trying to measure the effort and steps required for each subtask in T254646: Reconsidering how we name things. The idea is to invite folks at the WMF to join and decide which ones they'd like to tackle by giving them information about the effort and any other relevant information. I can see this task involves only the teams that are coding for private analytics infrastructure. In your opinion, what needs to be done to accomplish this task? Thanks!

The former Analytics and Research, former Analytics Engineering, now Data Engineering team has a number of repositories that need to be searched and updated. There are three kinds of changes, and we're not sure what the split looks like:

  1. Some changes would affect only the current repository. For example, variables in a private function or a utility library that are not exported.
  2. Some changes could affect other repositories, if they depend on the one where the change is made.
  3. Some changes move data around and could affect downstream consumers of that data. Some of this would fall into category 2. above, because we can find the dependencies in our jobs, but an additional number have unknown dependencies.

Changes that fall into category 1 are probably the majority by number, and will be very easy to fix. It's also work that could happen incrementally as you don't need a lot of context, just local analysis. Testing and deployment may be tricky at times, it likely will need someone from Data Engineering with experience doing ops duty.

Changes in category 2 will be trickier and are probably best combined with other tech debt repayment plans.

Changes in category 3 can affect other teams. They should be handled the same as category 2, with some additional communication and care.

CC @WDoranWMF

Notes from grooming: There were issues with moving to main, rumors that gerrit to gitlab migration will never be done. Without mandate to gitlab, this becomes more urgent/necessary.