Verify A/B test buckets are balanced and events are being logged as expected
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ppelberg
	Nov 18 2020, 11:27 PM

Description

This task is about verifying the software is logging the data necessary to conduct the analysis described in T252057.

Related Objects
Search...

Status	Assigned	Task
Open	None	T276497 Scale DiscussionTools to all projects
Open	None	T251207 [Epic] Scale DiscussionTools to all Wikipedias
Resolved	None	T262331 [Release ticket] Make Reply Tool available as opt-out preference at all Wikipedias
Resolved	None	T269062 Make the the Reply Tool available as an opt-out preference at all Phase 0, 1, 2 and 3 Wikipedias
Open	None	T233443 [Epic] Reply Tool
Resolved	MNeisler	T252057 Run A/B test to evaluate impact of Reply tool
Resolved	DLynch	T268193 Verify A/B test buckets are balanced and events are being logged as expected
Resolved	ppelberg	T273096 Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor.

Event Timeline

ppelberg created this task.Nov 18 2020, 11:27 PM

ppelberg moved this task from Backlog to Analytics on the Editing-team (Tracking) board.

ppelberg renamed this task from Verify A/B test events are being logged and emitted as expected to Verify A/B test buckets are balanced and events are being logged as expected.Dec 2 2020, 8:46 PM

ppelberg mentioned this in T268191: Implement A/B test bucketing.

MNeisler added a project: Product-Analytics.Dec 3 2020, 2:20 PM

MNeisler moved this task from Triage to Current Quarter on the Product-Analytics board.

LGoto triaged this task as Medium priority.Dec 8 2020, 6:15 PM

LGoto moved this task from Current Quarter to Upcoming Quarter on the Product-Analytics board.

MNeisler moved this task from Upcoming Quarter to Current Quarter on the Product-Analytics board.Jan 5 2021, 6:19 PM

MNeisler edited projects, added Product-Analytics (Kanban); removed Product-Analytics.Jan 20 2021, 9:00 PM

Enterprisey unsubscribed.Jan 20 2021, 11:54 PM

Per T268191#6767601, the A/B test has been deployed to id.wiki meaning this task can be worked on as soon as enough data has "accrued."

It has occurred to me that it's going to be difficult to check the test/control split by looking at the logging, even ignoring the "are enough people using it to overcome the sampling?" question, because people in the "control" group aren't interacting with DiscussionTools to cause logging events. Thus I'd expect to see test vastly over-represented, as "control" will only appear for people who went and manually enabled DiscussionTools.

However! The act of being assigned to a bucket does cause a user-preference to be set, so if we wanted to verify the split like that we could (presumably) query the database for it.

MNeisler moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Jan 26 2021, 5:50 PM

However! The act of being assigned to a bucket does cause a user-preference to be set, so if we wanted to verify the split like that we could (presumably) query the database for it.

Good call. I ran a query on the mediawiki user_properties table and was able to confirm the test/control split on idwiki.

property	test_group	num_users
discussiontools-abtest	control	783
discussiontools-abtest	test	748

There are sufficient data to confirm that the buckets appear to be balanced for idwiki (The current differences are within the probable range of a random 50/50 split). Also, I'm guessing this data also reflects any users that explicilty opt'd in/out of the AB test, which may account for some small differences in numbers between these two groups. @DLynch - Can you confirm?

While I can verify bucketing using the user properties table, it's going to be difficult to do the analysis without a way to distinguish AB test users in EditAttemptStep logged events. Would it be possible to add a value to the existing event.bucket field in EditAttemptStep to distinguish users in the reply tool AB test (Similar to how we distinguished users in the Mobile VE as Default AB test)?

Data via:

-- Run on 'idwiki' database
SELECT
  up_property as property,
  up_value as test_group,
  COUNT(DISTINCT up_user) as num_users
FROM user_properties
WHERE 
  up_property = 'discussiontools-abtest'
GROUP BY
  up_value,
  up_property

ppelberg reassigned this task from MNeisler to DLynch.Jan 27 2021, 1:03 AM

ppelberg edited projects, added Editing-team (Kanban Board); removed Editing-team (Tracking).

ppelberg moved this task from Incoming to Blocked / Needs More Work on the Editing-team (Kanban Board) board.

Also, I'm guessing this data also reflects any users that explicilty opt'd in/out of the AB test, which may account for some small differences in numbers between these two groups. @DLynch - Can you confirm?

Nope, once the group is assigned it sticks around and gets logged regardless -- it's not going to be cleared until we're done with the a/b test and I remove the registration of the preference. It's why you'd ever see any control events logged from DiscussionTools, people who've deliberately gone and entered the beta.

While I can verify bucketing using the user properties table, it's going to be difficult to do the analysis without a way to distinguish AB test users in EditAttemptStep logged events. Would it be possible to add a value to the existing event.bucket field in EditAttemptStep to distinguish users in the reply tool AB test (Similar to how we distinguished users in the Mobile VE as Default AB test)?

It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.

Nope, once the group is assigned it sticks around and gets logged regardless -- it's not going to be cleared until we're done with the a/b test and I remove the registration of the preference. It's why you'd ever see any control events logged from DiscussionTools, people who've deliberately gone and entered the beta.

Got it. That makes sense. The difference between the two groups is still within the probable range of a 50/50 split so the bucketing looks good to me.

It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.

Yes, I think for the analysis to be done correctly we'll also need to be able to identify any non-discussion tool logged events by users in the AB test.

In T268193#6781374, @MNeisler wrote:

It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.

Yes, I think for the analysis to be done correctly we'll also need to be able to identify any non-discussion tool logged events by users in the AB test.

@MNeisler, here's a ticket for the work you and @DLynch were discussing: T273096.

Next steps

Megan, can you please fill in T273096's ===Requirements section?

ppelberg closed this task as Resolved.Jan 27 2021, 8:10 PM

ppelberg closed subtask T273096: Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor. as Resolved.Mar 3 2021, 3:10 AM

Verify A/B test buckets are balanced and events are being logged as expectedClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Verify A/B test buckets are balanced and events are being logged as expected
Closed, ResolvedPublic
Actions

Related Objects
Search...