Implement A/B test bucketing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ppelberg
	Nov 18 2020, 11:12 PM

Description

This task is about implementing the changes necessary to ensure the "right people" are are included in the Reply Tool A/B test in the "right way."

Bucketing criteria

People who meet all of the "Conditions" listed below, ought to have a 50% chance of being bucketed into the A/B test's control or test group.
Conditions

Editing at the following Wikipedias (source: T267379):

Wiki	Code
French Wikipedia	frwiki
Spanish Wikipedia	eswiki
Italian Wikipedia	itwiki
Japanese Wikipedia	jawiki
Persian Wikipedia	fawiki
Polish Wikipedia	plwiki
Hebrew Wikipedia	hewiki
Dutch Wikipedia	nlwiki
Hindi Wikipedia	hiwiki
Korean Wikipedia	kowiki
Vietnamese Wikipedia	viwiki
Thai Wikipedia	thwiki
Portuguese Wikipedia	ptwiki
Bengali Wikipedia	bnwiki
Egyptian Wikipedia	arzwiki
Swahili Wikipedia	swwiki
Chinese Wikipedia	zhwiki
Ukrainian Wikipedia	ukwiki
Indonesia Wikipedia	idwiki
Amharic Wikipedia	amwiki
Oromo Wikipedia	omwiki
Afrikaans Wikipedia	afwiki

Have not used the Reply Tool before.
- In this context, "not used" is being defined as people whose discussiontools-editmode preference is empty.
People who are logged in

Two additional notes:

Bucketing ought to be done a per wiki basis.
The software will need to remember that someone was included in the A/B test so they are not mistakenly removed from it.

Open questions

1. Does this task need to be blocked on having defined the list of Wikipedias that will be participating in the test? See: T267379.
- Moot point as T267379 has been resolved.
2. What should happen when someone from the control group manually enables the Reply Tool in Special:Preferences? Should them changing their preference cause them to be added to the test group? Should their usage of the Reply Tool, that they would have manually enabled, be considered part of the control group?
- When someone from the control group manually enables the Reply Tool in Special:Preferences, they should remain in the control group. See: T268191#6746113.
3. Can the bucketing be deployed to test.wikipedia.org to conduct the QA that will happen in T268193. Additional context: T268191#6746113.
- We decided to use id.wiki for QA. See: T268191#6763627.

Done

The "Bucketing criteria" above have been implemented
~~Verify on Beta that the code required to assign people to the test and control groups is working as expected.~~
- This happened in: T268193.

Note: verification that people are being bucketed as expected will happen in T268193.

Details

Subject	Repo	Branch	Lines /-
Enroll idwiki in the DiscussionTools a/b test	operations/mediawiki-config	master	5 -0
A/B test output when a specific feature is being tested	mediawiki/extensions/DiscussionTools	wmf/1.36.0-wmf.27	1 -1
A/B test output when a specific feature is being tested	mediawiki/extensions/DiscussionTools	master	1 -1
A/B test bucketing for beta enrollment	mediawiki/extensions/DiscussionTools	master	80 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T276497 Scale DiscussionTools to all projects
Open	None	T251207 [Epic] Scale DiscussionTools to all Wikipedias
Resolved	None	T262331 [Release ticket] Make Reply Tool available as opt-out preference at all Wikipedias
Resolved	None	T269062 Make the the Reply Tool available as an opt-out preference at all Phase 0, 1, 2 and 3 Wikipedias
Open	None	T233443 [Epic] Reply Tool
Resolved	MNeisler	T252057 Run A/B test to evaluate impact of Reply tool
Resolved	ppelberg	T268191 Implement A/B test bucketing

Event Timeline

ppelberg created this task.Nov 18 2020, 11:12 PM

Task description update
Today, @MNeisler and I talked about the A/B test's bucketing criteria [i] which are now reflected in the task description's ===Bucketing criteria section.

i. Bucketing criteria

People editing at the following Wikipedias: TBD; see T267379.
Have not used the Reply Tool before.
- In this context, "used" is being defined as people who have initiated (read: reached init) the Reply Tool before.
People who are logged in

ppelberg moved this task from Incoming to Upcoming on the Editing-team (Kanban Board) board.Nov 18 2020, 11:21 PM

Task description update

Have not used the Reply Tool before.

In this context, "used" is being defined as people who have initiated (read: reached init) the Reply Tool before.

During yesterday's team meeting, @DLynch and @Esanders shared that it is not possible to use whether someone has caused an init event to be emitted as a proxy for whether they have used the Reply Tool before or not.

The reason: these init events are stored in such a way that it is difficult for the software to "look up" whether a given person has triggered said init event in a performant way.

Instead, David and Ed shared we could look at whether someone has a discussiontools-editmode preference set. If said preference is empty, we can infer a given account has not opened the tool on that wiki before. Note: this only applies to people who are logged in.

I've updated the task description to reflect the above.

ppelberg updated the task description. (Show Details)Nov 20 2020, 11:43 PM

ppelberg updated the task description. (Show Details)Dec 2 2020, 1:19 AM

ppelberg moved this task from Upcoming to Incoming on the Editing-team (Kanban Board) board.

Task description update

Changes to ===Done section:
- REMOVED @MNeisler's data check as she will verify test buckets are balanced in T268193.
- ADDED a check to verify that the code required to assign people to the test and control groups is working as expected as on the Beta cluster.

ppelberg updated the task description. (Show Details)Dec 2 2020, 8:50 PM

ppelberg moved this task from Incoming to Upcoming on the Editing-team (Kanban Board) board.Dec 3 2020, 4:52 PM

ppelberg updated the task description. (Show Details)Dec 11 2020, 10:33 PM

ppelberg moved this task from Upcoming to Ready to Be Worked On on the Editing-team (Kanban Board) board.Dec 11 2020, 10:44 PM

DLynch updated the task description. (Show Details)Dec 21 2020, 5:32 PM

Meeting notes: 21-December meeting notes
During today's team standup, @DLynch raised the following points...

We need to be sure the software remembers that someone was included in the A/B test so they are not mistakenly removed from it.
@MNeisler: What should happen when someone from the control group manually enables the Reply Tool in Special:Preferences? Should them changing their preference cause them to be added to the test group? Our instinct: "No." Should their usage of the Reply Tool, that they would have manually enabled, be considered part of the control group? Our instinct: "Yes."

"1." and "2." above have been added to the task description's ===Bucketing criteria and Open questions sections respectively.

ppelberg updated the task description. (Show Details)Dec 21 2020, 8:26 PM

A few limitations to bear in mind:

We're going to be remembering what bucket someone is in based on their cookies. If use the reply tool and then clear their cookies they're going to stop being in the test and so won't get re-bucketed.

If they use the reply tool and then log in on a different machine, they won't be in the a/b test on that machine, but will be on their original machine.

We could make it into a user-option instead, and thus stored in the db, but we'd want to clean that up after the test is done.

In T268191#6708340, @DLynch wrote:

We could make it into a user-option instead, and thus stored in the db, but we'd want to clean that up after the test is done.

@DLynch would bucketing people based on a user-option, as you're describing above, relieve us of the limitations you are describing below? If so, what – if any – tradeoffs should we be mindful of before committing to bucketing people based on user-options instead of cookies?

We're going to be remembering what bucket someone is in based on their cookies. If use the reply tool and then clear their cookies they're going to stop being in the test and so won't get re-bucketed.

If they use the reply tool and then log in on a different machine, they won't be in the a/b test on that machine, but will be on their original machine.

ppelberg assigned this task to DLynch.Dec 22 2020, 5:35 PM

ppelberg moved this task from Ready to Be Worked On to Doing on the Editing-team (Kanban Board) board.

In T268191#6708340, @DLynch wrote:

If they use the reply tool and then log in on a different machine, they won't be in the a/b test on that machine, but will be on their original machine.

I hate when a software behaves differently based on the device (or browser profile) I’m using. It happened for me quite a number of times on YouTube, where I intentionally use different profiles on the same machine (to limit how much Google tracks me). It’s annoying for me, but it may seem to be a bug for someone who doesn’t know what A/B testing is. One of the worst ways a software can be designed is making something intentionally that seems like a bug.

Change 655861 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/DiscussionTools@master] A/B test bucketing for beta enrollment

https://gerrit.wikimedia.org/r/655861

gerritbot added a project: Patch-For-Review.Jan 13 2021, 8:16 AM

DLynch moved this task from Doing to Code Review on the Editing-team (Kanban Board) board.Jan 13 2021, 8:20 AM

Meeting notes

These are notes from the conversation @MNeisler and I had today.

Deployment

@DLynch: are we able to deploy the bucketing patch to https://test.wikipedia.org/wiki/Main_Page during Monday's (18-Jan) backport window instead of deploying it to the candidate wikis (T267379) which are now listed in the task description?
- Reason: we'd rather QA the bucketing (T268193) on a test wiki rather than risk us needing to stop the test in production to address potential bug(s) and then re-start it.

Remaining ===Open questions

We confirmed that when someone from the control group manually enables the Reply Tool in Special:Preferences, they should remain in the control group. I've updated the task description to reflect this.
- Related: if we notice during analysis (see: T252057) we notice people in the test and control groups engaging with talk pages in similar ways, we might explore the percentage of people within the control group who manually enabled the Reply Tool in Special:Preferences.

ppelberg updated the task description. (Show Details)Jan 13 2021, 10:19 PM

Restricted Application added subscribers: Huji, Stang, Base, revi. · View Herald TranscriptJan 13 2021, 10:19 PM

Change 655861 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] A/B test bucketing for beta enrollment

https://gerrit.wikimedia.org/r/655861

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.27; 2021-01-19).Jan 14 2021, 1:00 PM

Maintenance_bot removed a project: Patch-For-Review.Jan 14 2021, 1:11 PM

@ppelberg you mean backport the entire A/B-test patch to every-wiki, and then deploy the config to the test wiki? It might be easier to just put the config patch on beta, since it already has the bucketing.

In T268191#6748544, @DLynch wrote:

@ppelberg you mean backport the entire A/B-test patch to every-wiki, and then deploy the config to the test wiki? It might be easier to just put the config patch on beta, since it already has the bucketing.

@MNeisler: a question for you came up as we were talking about the point @DLynch is raising above and the QA we have planned in T268193:

Would you feel comfortable verifying the A/B test bucketing is working as expected on the beta cluster [i] instead of the test wiki [ii]? Engineering shared the former would be more straightforward and we all suspected usage of the two non-production wikis would be comparable [iii].

i. https://en.wikipedia.beta.wmflabs.org/
ii. https://test.wikipedia.org/
iii. Note: if more scale is needed than what we think we'll have access to on a non-production wiki, @Whatamidoing-WMF talked and think id.wiki would be a good place to deploy and test the bucketing in production.

Enterprisey unsubscribed.Jan 15 2021, 7:41 AM

ppelberg reassigned this task from DLynch to MNeisler.Jan 19 2021, 9:19 PM

Would you feel comfortable verifying the A/B test bucketing is working as expected on the beta cluster [i] instead of the test wiki [ii]? Engineering shared the former would be more straightforward and we all suspected usage of the two non-production wikis would be comparable [iii].

Unlike the test wiki, there's a not a good way for me to query the eventlogging data from the beta cluster to confirm the buckets are balanced. I can check the log file but it is restricted to a certain size and events usually only stay there for a couple hours, which does not provide the scale we need to confirm the buckets are balanced.

Based on this and given the complexity of trying to change deployment to the test wiki, I'd recommend using id.wiki to test the bucketing in production. That should provide the scale and data accessibility I need to confirm the buckets are balanced.

In T268191#6762788, @MNeisler wrote:

Based on this and given the complexity of trying to change deployment to the test wiki, I'd recommend using id.wiki to test the bucketing in production. That should provide the scale and data accessibility I need to confirm the buckets are balanced.

Thank you for thinking this through, @MNeisler; let's do as you are suggesting above and deploy the A/B test to id.wiki for the purposes of verifying people are being bucketed in way we expect.

I'm going to consult the team on the next steps needed to make the above happen; I will follow up here once I know what they are.

ppelberg claimed this task.Jan 20 2021, 9:41 PM

Change 657690 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/DiscussionTools@master] A/B test output when a specific feature is being tested

https://gerrit.wikimedia.org/r/657690

gerritbot added a project: Patch-For-Review.Jan 21 2021, 9:46 PM

Change 657691 had a related patch set uploaded (by DLynch; owner: DLynch):
[operations/mediawiki-config@master] Enroll idwiki in the DiscussionTools a/b test

https://gerrit.wikimedia.org/r/657691

Change 657690 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] A/B test output when a specific feature is being tested

https://gerrit.wikimedia.org/r/657690

ReleaseTaggerBot edited projects, added MW-1.36-notes (1.36.0-wmf.28; 2021-01-26); removed MW-1.36-notes (1.36.0-wmf.27; 2021-01-19).Jan 21 2021, 11:00 PM

Change 657653 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/DiscussionTools@wmf/1.36.0-wmf.27] A/B test output when a specific feature is being tested

https://gerrit.wikimedia.org/r/657653

In T268191#6763627, @ppelberg wrote:

In T268191#6762788, @MNeisler wrote:

Based on this and given the complexity of trying to change deployment to the test wiki, I'd recommend using id.wiki to test the bucketing in production. That should provide the scale and data accessibility I need to confirm the buckets are balanced.

Thank you for thinking this through, @MNeisler; let's do as you are suggesting above and deploy the A/B test to id.wiki for the purposes of verifying people are being bucketed in way we expect.

I'm going to consult the team on the next steps needed to make the above happen; I will follow up here once I know what they are.

To close this loop, @MNeisler – we have everything ready on our end to start the A/B test on id.wiki. Tho, this happening depends on the train rolling which is currently blocked on T272638. Handing off to @DLynch to comment on this ticket when the A/B test reaches id.wiki

Change 657653 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@wmf/1.36.0-wmf.27] A/B test output when a specific feature is being tested

https://gerrit.wikimedia.org/r/657653

Change 657691 merged by jenkins-bot:
[operations/mediawiki-config@master] Enroll idwiki in the DiscussionTools a/b test

https://gerrit.wikimedia.org/r/657691

Mentioned in SAL (#wikimedia-operations) [2021-01-22T01:14:53Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.27/extensions/DiscussionTools/: 513a7861bbcf06a8ac5c29e1b9838640cbd7c628: A/B test output when a specific feature is being tested (T268191) (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2021-01-22T01:16:39Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 376cba1b33dd68d40490a1498c59a4d430318ab1: Enroll idwiki in the DiscussionTools a/b test (T268191) (duration: 00m 55s)

This is deployed to idwiki, and @MNeisler will be verifying the test/control split looks even once data comes in over the weekend. I did verify with a new account that I was assigned appropriately to a bucket as part of the deploy process.

ReleaseTaggerBot edited projects, added MW-1.36-notes (1.36.0-wmf.27; 2021-01-19); removed MW-1.36-notes (1.36.0-wmf.28; 2021-01-26).Jan 22 2021, 2:00 AM

ppelberg mentioned this in T268193: Verify A/B test buckets are balanced and events are being logged as expected.Jan 22 2021, 3:38 AM

DLynch moved this task from QA to Ready for Sign Off on the Editing-team (Kanban Board) board.Jan 30 2021, 6:45 PM

ppelberg closed this task as Resolved.Feb 1 2021, 5:24 AM

ppelberg updated the task description. (Show Details)

ppelberg mentioned this in T273554: Make config change to enable Reply Tool A/B test .Feb 1 2021, 11:05 PM

ppelberg mentioned this in T273406: Announce start of A/B test.Feb 1 2021, 11:30 PM

Stang unsubscribed.Nov 13 2021, 11:30 PM