Page MenuHomePhabricator

Verify A/B test buckets are balanced and events are being logged as expected
Closed, ResolvedPublic

Description

This task is about verifying the software is logging the data necessary to conduct the analysis described in T252057.

Event Timeline

ppelberg renamed this task from Verify A/B test events are being logged and emitted as expected to Verify A/B test buckets are balanced and events are being logged as expected.Dec 2 2020, 8:46 PM
LGoto triaged this task as Medium priority.Dec 8 2020, 6:15 PM
LGoto moved this task from Current Quarter to Upcoming Quarter on the Product-Analytics board.

Per T268191#6767601, the A/B test has been deployed to id.wiki meaning this task can be worked on as soon as enough data has "accrued."

It has occurred to me that it's going to be difficult to check the test/control split by looking at the logging, even ignoring the "are enough people using it to overcome the sampling?" question, because people in the "control" group aren't interacting with DiscussionTools to cause logging events. Thus I'd expect to see test vastly over-represented, as "control" will only appear for people who went and manually enabled DiscussionTools.

However! The act of being assigned to a bucket does cause a user-preference to be set, so if we wanted to verify the split like that we could (presumably) query the database for it.

However! The act of being assigned to a bucket does cause a user-preference to be set, so if we wanted to verify the split like that we could (presumably) query the database for it.

Good call. I ran a query on the mediawiki user_properties table and was able to confirm the test/control split on idwiki.

propertytest_groupnum_users
discussiontools-abtestcontrol783
discussiontools-abtesttest748

There are sufficient data to confirm that the buckets appear to be balanced for idwiki (The current differences are within the probable range of a random 50/50 split). Also, I'm guessing this data also reflects any users that explicilty opt'd in/out of the AB test, which may account for some small differences in numbers between these two groups. @DLynch - Can you confirm?

While I can verify bucketing using the user properties table, it's going to be difficult to do the analysis without a way to distinguish AB test users in EditAttemptStep logged events. Would it be possible to add a value to the existing event.bucket field in EditAttemptStep to distinguish users in the reply tool AB test (Similar to how we distinguished users in the Mobile VE as Default AB test)?

Data via:

-- Run on 'idwiki' database
SELECT
  up_property as property,
  up_value as test_group,
  COUNT(DISTINCT up_user) as num_users
FROM user_properties
WHERE 
  up_property = 'discussiontools-abtest'
GROUP BY
  up_value,
  up_property

Also, I'm guessing this data also reflects any users that explicilty opt'd in/out of the AB test, which may account for some small differences in numbers between these two groups. @DLynch - Can you confirm?

Nope, once the group is assigned it sticks around and gets logged regardless -- it's not going to be cleared until we're done with the a/b test and I remove the registration of the preference. It's why you'd ever see any control events logged from DiscussionTools, people who've deliberately gone and entered the beta.

While I can verify bucketing using the user properties table, it's going to be difficult to do the analysis without a way to distinguish AB test users in EditAttemptStep logged events. Would it be possible to add a value to the existing event.bucket field in EditAttemptStep to distinguish users in the reply tool AB test (Similar to how we distinguished users in the Mobile VE as Default AB test)?

It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.

Nope, once the group is assigned it sticks around and gets logged regardless -- it's not going to be cleared until we're done with the a/b test and I remove the registration of the preference. It's why you'd ever see any control events logged from DiscussionTools, people who've deliberately gone and entered the beta.

Got it. That makes sense. The difference between the two groups is still within the probable range of a 50/50 split so the bucketing looks good to me.

It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.

Yes, I think for the analysis to be done correctly we'll also need to be able to identify any non-discussion tool logged events by users in the AB test.

It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.

Yes, I think for the analysis to be done correctly we'll also need to be able to identify any non-discussion tool logged events by users in the AB test.

@MNeisler, here's a ticket for the work you and @DLynch were discussing: T273096.


Next steps

  • Megan, can you please fill in T273096's ===Requirements section?