This task is about verifying the software is logging the data necessary to conduct the analysis described in T252057.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T276497 Scale DiscussionTools to all projects | |||
Open | None | T251207 [Epic] Scale DiscussionTools to all Wikipedias | |||
Resolved | None | T262331 [Release ticket] Make Reply Tool available as opt-out preference at all Wikipedias | |||
Resolved | None | T269062 Make the the Reply Tool available as an opt-out preference at all Phase 0, 1, 2 and 3 Wikipedias | |||
Open | None | T233443 [Epic] Reply Tool | |||
Resolved | MNeisler | T252057 Run A/B test to evaluate impact of Reply tool | |||
Resolved | DLynch | T268193 Verify A/B test buckets are balanced and events are being logged as expected | |||
Resolved | ppelberg | T273096 Add DiscussionTools a/b test bucket information to events from VisualEditor and WikiEditor. |
Event Timeline
Per T268191#6767601, the A/B test has been deployed to id.wiki meaning this task can be worked on as soon as enough data has "accrued."
It has occurred to me that it's going to be difficult to check the test/control split by looking at the logging, even ignoring the "are enough people using it to overcome the sampling?" question, because people in the "control" group aren't interacting with DiscussionTools to cause logging events. Thus I'd expect to see test vastly over-represented, as "control" will only appear for people who went and manually enabled DiscussionTools.
However! The act of being assigned to a bucket does cause a user-preference to be set, so if we wanted to verify the split like that we could (presumably) query the database for it.
However! The act of being assigned to a bucket does cause a user-preference to be set, so if we wanted to verify the split like that we could (presumably) query the database for it.
Good call. I ran a query on the mediawiki user_properties table and was able to confirm the test/control split on idwiki.
property | test_group | num_users |
---|---|---|
discussiontools-abtest | control | 783 |
discussiontools-abtest | test | 748 |
There are sufficient data to confirm that the buckets appear to be balanced for idwiki (The current differences are within the probable range of a random 50/50 split). Also, I'm guessing this data also reflects any users that explicilty opt'd in/out of the AB test, which may account for some small differences in numbers between these two groups. @DLynch - Can you confirm?
While I can verify bucketing using the user properties table, it's going to be difficult to do the analysis without a way to distinguish AB test users in EditAttemptStep logged events. Would it be possible to add a value to the existing event.bucket field in EditAttemptStep to distinguish users in the reply tool AB test (Similar to how we distinguished users in the Mobile VE as Default AB test)?
Data via:
-- Run on 'idwiki' database SELECT up_property as property, up_value as test_group, COUNT(DISTINCT up_user) as num_users FROM user_properties WHERE up_property = 'discussiontools-abtest' GROUP BY up_value, up_property
Also, I'm guessing this data also reflects any users that explicilty opt'd in/out of the AB test, which may account for some small differences in numbers between these two groups. @DLynch - Can you confirm?
Nope, once the group is assigned it sticks around and gets logged regardless -- it's not going to be cleared until we're done with the a/b test and I remove the registration of the preference. It's why you'd ever see any control events logged from DiscussionTools, people who've deliberately gone and entered the beta.
While I can verify bucketing using the user properties table, it's going to be difficult to do the analysis without a way to distinguish AB test users in EditAttemptStep logged events. Would it be possible to add a value to the existing event.bucket field in EditAttemptStep to distinguish users in the reply tool AB test (Similar to how we distinguished users in the Mobile VE as Default AB test)?
It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.
Nope, once the group is assigned it sticks around and gets logged regardless -- it's not going to be cleared until we're done with the a/b test and I remove the registration of the preference. It's why you'd ever see any control events logged from DiscussionTools, people who've deliberately gone and entered the beta.
Got it. That makes sense. The difference between the two groups is still within the probable range of a 50/50 split so the bucketing looks good to me.
It's already logging the bucket to EditAttemptStep and VisualEditorFeatureUse, but only from DiscussionTools. If you mean adding it to the logging from VisualEditor/WikiEditor, that could be done.
Yes, I think for the analysis to be done correctly we'll also need to be able to identify any non-discussion tool logged events by users in the AB test.