Page MenuHomePhabricator

mpopov (Mikhail Popov)
Manager, Data Science

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jul 27 2015, 4:15 PM (475 w, 1 d)
Availability
Available
IRC Nick
bearloga
LDAP User
Bearloga
MediaWiki User
MPopov (WMF) [ Global Accounts ]

Using statistical analysis, Bayesian inference, machine learning, and software/data engineering to solve problems and inform decisions in Product Analytics

Recent Activity

Fri, Aug 23

mpopov added a comment to T373194: Requesting access to airflow-analytics-product-admins for kcvelaga.

Approved

Fri, Aug 23, 2:20 PM · Patch-For-Review, SRE, SRE-Access-Requests

Mon, Aug 19

mpopov added a comment to T370170: Implement instrumentation for Community Wishlist.

@KSiebert

not applicable, data scientist will check if tracking works

Instrumentation QA is two-step process. First, the engineer (whether the software engineer or a QTE) needs to QA the instrumentation as much as they can to make sure that it is producing the desired events (see https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Validate_Events, including that there aren't any validation errors).

Mon, Aug 19, 2:31 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist
mpopov added a subtask for T372773: Reporting for Community Wishlist: T368674: Instrumentation for Community Wishlist.
Mon, Aug 19, 2:23 PM · Community Wishlist, Community-Tech, Product-Analytics
mpopov added a parent task for T368674: Instrumentation for Community Wishlist: T372773: Reporting for Community Wishlist.
Mon, Aug 19, 2:23 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)
mpopov triaged T372773: Reporting for Community Wishlist as Medium priority.
Mon, Aug 19, 2:23 PM · Community Wishlist, Community-Tech, Product-Analytics

Fri, Aug 16

mpopov moved T372108: Document desired properties of an enrollment sampling algorithm from Doing to Needs Review on the Product-Analytics (Kanban) board.
Fri, Aug 16, 7:40 PM · Data Products (Data products Sprint 18), Product-Analytics (Kanban), Metrics Platform
mpopov added a comment to T372108: Document desired properties of an enrollment sampling algorithm.

Will sample consistently if given the same starting value (e.g. if we're sampling on page ID, the same page ID will always return the same assignment).

Fri, Aug 16, 7:35 PM · Data Products (Data products Sprint 18), Product-Analytics (Kanban), Metrics Platform

Thu, Aug 15

mpopov updated the task description for T368674: Instrumentation for Community Wishlist.
Thu, Aug 15, 6:40 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)
mpopov updated the task description for T368674: Instrumentation for Community Wishlist.
Thu, Aug 15, 6:40 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)

Mon, Aug 12

mpopov updated subscribers of T370117: [Epic] Recommend Articles in Search on Android App.

Questions for @JTannerWMF & @SNowick_WMF:

Mon, Aug 12, 7:04 PM · Design, Wikimedia-Design, Wikipedia-Android-App-Backlog (Android Release - FY2024-25), FY2024-25 KR 3.1 Content Discovery, Epic

Thu, Aug 8

mpopov moved T372108: Document desired properties of an enrollment sampling algorithm from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Thu, Aug 8, 9:07 PM · Data Products (Data products Sprint 18), Product-Analytics (Kanban), Metrics Platform
mpopov triaged T372108: Document desired properties of an enrollment sampling algorithm as Medium priority.
Thu, Aug 8, 9:07 PM · Data Products (Data products Sprint 18), Product-Analytics (Kanban), Metrics Platform
mpopov awarded T371373: airflow-dags: Mutualization of _IMPORTED flag sensors creations a Like token.
Thu, Aug 8, 7:40 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
mpopov updated the task description for T367057: [SPIKE] Document decision to use a single table per base schema.
Thu, Aug 8, 6:29 PM · Data Products (Data Products Sprint 17), Spike, Documentation, Metrics Platform
mpopov added a comment to T368303: REQUEST: Add Special:AllEvents to allowlist for campaigns-product pageview tracking.

Hi @ifried! @Iflorez and I discussed your question yesterday and it's going to come down to how the feature actually works. Either:

  • Scenario A: User can switch between the tabs seamlessly (powered by JS) without needing to reload the page.
  • Scenario B: When page loads it checks for ?tab=Events (default) or ?tab=Communities and switching between them causes the page to reload with different URI query parameter.
Thu, Aug 8, 6:22 PM · Data Products (Data products Sprint 18), Event-Discovery, Data-Platform

Wed, Aug 7

mpopov added a comment to T369687: Develop a reusable Metrics Platform schema fragment for translation workflows.

@mforns: I just discussed this with @KCVelaga_WMF and confirmed that multiple pieces of information will be "all manifest at the same time, atomically"

Wed, Aug 7, 3:34 PM · Data Products (Data products Sprint 18), Patch-For-Review, Product-Analytics, LPL Analytics

Aug 1 2024

mpopov added a comment to T371404: Measuring Edits Rollbacks.

Apologies @Tchanders – I created the placeholder ticket based on misunderstanding/misinterpreting the term "rollback" when Niharika's request came in to have a rollback metric (no other details given) so I read "rollback" as rolling back Temp Accounts, and I see now that a much different thing was intended.

Aug 1 2024, 6:05 PM · Product-Analytics (Kanban), Temporary accounts
mpopov updated the task description for T371404: Measuring Edits Rollbacks.
Aug 1 2024, 6:04 PM · Product-Analytics (Kanban), Temporary accounts
mpopov updated subscribers of T371560: REQUEST: A useful namespace_canonical_name column in wmf_raw.mediawiki_project_namespace_map.

@larissagaulia: Can you please check with your team if T52655: Allow adding canonical names for custom namespaces is indeed still a blocker for this or if it has been resolved, just not linked to the ticket?

Aug 1 2024, 2:02 PM · Analytics-Canonical-Data, Data-Platform

Jul 31 2024

mpopov created T371560: REQUEST: A useful namespace_canonical_name column in wmf_raw.mediawiki_project_namespace_map.
Jul 31 2024, 9:18 PM · Analytics-Canonical-Data, Data-Platform
mpopov added a comment to T367057: [SPIKE] Document decision to use a single table per base schema.

@cjming @Ottomata: Another negative to document (and think about): event sanitization. We can configure sanitization/retention policies on a per-instrument basis since they are different streams/tables, but with the monostream/monotable we would lose that flexibility. Without changing how the current sanitization pipeline works, we would have a single entry in the allowlist for the monotable. We would have to reconsider how we evaluate risk when it comes to retaining sanitized data longer than 90 days.

Jul 31 2024, 4:42 PM · Data Products (Data Products Sprint 17), Spike, Documentation, Metrics Platform

Jul 30 2024

mpopov added a comment to T366627: [MPIC] Analyse risk of potential performance issues with static approach to stream configuration.

@Ottomata @xcollazo: I can't review all the discussion on this ticket but Andrew pointed me here:

Jul 30 2024, 4:13 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Products, Metrics Platform
mpopov updated subscribers of T371404: Measuring Edits Rollbacks.

How do we measure this in an automated/pipeline-able way without requiring manual data input?

Jul 30 2024, 3:22 PM · Product-Analytics (Kanban), Temporary accounts
mpopov added a comment to T346466: Investigate how temporary accounts are logged and categorized.

@Niharika: Is this still needed for any decision making with regards to Temp Accounts implementation or can I decline this?

Jul 30 2024, 3:19 PM · Temporary accounts, Product-Analytics, Anti-Harassment
mpopov updated the task description for T371402: [WE4.4] Additional monitoring capabilities for rollout of Temporary Account.
Jul 30 2024, 3:14 PM · Product-Analytics (Kanban), Temporary accounts
mpopov created T371404: Measuring Edits Rollbacks.
Jul 30 2024, 3:13 PM · Product-Analytics (Kanban), Temporary accounts
mpopov renamed T371402: [WE4.4] Additional monitoring capabilities for rollout of Temporary Account from Additional monitoring capabilities for rollout of Temporary Account to [WE4.4] Additional monitoring capabilities for rollout of Temporary Account.
Jul 30 2024, 2:52 PM · Product-Analytics (Kanban), Temporary accounts
mpopov changed the status of T364406: Migrate IP masking dashboard data pipeline to airflow from Open to Stalled.

Pausing this work in favor of T371402: [WE4.4] Additional monitoring capabilities for rollout of Temporary Account

Jul 30 2024, 2:46 PM · Temporary accounts (Figure out Analytics/Instrumentation for Temp Accounts rollout), Product-Analytics (Kanban)
mpopov edited projects for T371402: [WE4.4] Additional monitoring capabilities for rollout of Temporary Account, added: Product-Analytics (Kanban); removed Product-Analytics.
Jul 30 2024, 2:44 PM · Product-Analytics (Kanban), Temporary accounts
mpopov moved T371402: [WE4.4] Additional monitoring capabilities for rollout of Temporary Account from Triage to Current Quarter on the Product-Analytics board.
Jul 30 2024, 2:44 PM · Product-Analytics (Kanban), Temporary accounts
mpopov triaged T371402: [WE4.4] Additional monitoring capabilities for rollout of Temporary Account as High priority.
Jul 30 2024, 2:44 PM · Product-Analytics (Kanban), Temporary accounts

Jul 29 2024

mpopov added a comment to T367057: [SPIKE] Document decision to use a single table per base schema.

Please add the following to Negative Consequences:

  • Vast majority of interaction data would be going into one massive table which will create significant limitations for how much data we would be able to query with Presto – potentially only an hour at a time as opposed to multiple days or weeks or even months that is possible now with the smaller, per-instrument tables. Depending on how powerful our Presto cluster is, we may would likely have to switch to working with interaction data exclusively outside of Superset's SQL Lab since Presto and Spark SQL differ substantially and require a high degree to effort to translate queries between those two SQL dialects.
  • This will also negatively affect our ability to create Superset dashboards with Presto based on the un-aggregated interaction data, which has become a common practice among Product Analysts. We accept this consequence because the metrics we measure and make available in those dashboards and other reports should be pre-computed with data pipelines (that have access to more powerful and robust Spark SQL) rather than calculated on-the-fly with Presto. We can still use Presto but mainly for working with pre-computed measurements of interaction metrics rather than with raw interaction data.
Jul 29 2024, 8:55 PM · Data Products (Data Products Sprint 17), Spike, Documentation, Metrics Platform

Jul 26 2024

mpopov updated the task description for T371141: Analyze impact of Magru data center on unique devices in South America.
Jul 26 2024, 7:21 PM · Product-Analytics
mpopov raised the priority of T371141: Analyze impact of Magru data center on unique devices in South America from Medium to High.
Jul 26 2024, 7:15 PM · Product-Analytics
mpopov triaged T371141: Analyze impact of Magru data center on unique devices in South America as Medium priority.
Jul 26 2024, 7:15 PM · Product-Analytics

Jul 25 2024

mpopov closed T365144: Application Security Review Request : Quarto as Declined.

Rescinding after checking in with @acooper and clarifying that this was created out of a misunderstanding.

Jul 25 2024, 4:15 PM · Product-Analytics, secscrum, Security, Application Security Reviews
mpopov awarded T363979: [EPIC] Create a standardised page lifecycle instrument mixin a Love token.
Jul 25 2024, 3:19 PM · Data Products (Epics Timeline), Epic, Metrics Platform

Jul 24 2024

mpopov added a comment to T369847: Setup basic send and receive wiring between a MW instance and a Statsig cloud instance.

(The file attachment – a diagram I suspect? – is missing.)

Jul 24 2024, 12:58 AM · Data Products (Data products Sprint 18), Patch-For-Review, Metrics Platform

Jul 19 2024

mpopov added a comment to T364872: Unique devices per country spikes on wikifunctions .
select normalized_host.project_class, ip, count(1) as view_count
from pageview_actor 
where year = 2024 and month = 6 and day = 28
  and http_status = '301'
  and agent_type = 'user'
  and uri_path = '/w/index.php'
  and regexp_like(uri_query, 'title=Special:GlobalUsage')
  and (is_redirect_to_pageview or is_pageview)
group by 1, 2
order by view_count desc
limit 1000
Jul 19 2024, 2:21 PM · Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem, Data-Platform
mpopov added a comment to T364872: Unique devices per country spikes on wikifunctions .

It also means there's nobody to ask to fix the behavior. I believe this requires engineering help from DPE.

Jul 19 2024, 1:48 PM · Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem, Data-Platform
mpopov added a comment to T364872: Unique devices per country spikes on wikifunctions .

@Mayakp.wiki: Special:GlobalUsage comes from Extension:GlobalUsage (GlobalUsage), which is a volunteer-authored extension.

Jul 19 2024, 1:47 PM · Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem, Data-Platform

Jul 18 2024

mpopov added a comment to T367553: Cloud VPS "shiny-r" project Buster deprecation.

Thanks for checking in! No objection from me.

Jul 18 2024, 6:42 PM · Cloud-VPS (Debian Buster Deprecation)

Jul 17 2024

mpopov updated subscribers of T368326: Update Metrics Platform Client Libraries to accept experiment membership.

@phuedx @VirginiaPoundstone: Okie dokie, I updated the task description with the data model specification and updated the requirements based on a guess of what is involved.

Jul 17 2024, 1:34 AM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform
mpopov updated the task description for T368326: Update Metrics Platform Client Libraries to accept experiment membership.
Jul 17 2024, 1:33 AM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform
mpopov updated the task description for T368326: Update Metrics Platform Client Libraries to accept experiment membership.
Jul 17 2024, 1:27 AM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform

Jul 15 2024

mpopov updated subscribers of T365144: Application Security Review Request : Quarto.

@sbassett: I have folks on my team very eager to use this again for reporting. (I've asked them to pause until it's marked as Approved in ITS' Software Catalog.) Is there an ETA for this review?

Jul 15 2024, 5:32 PM · Product-Analytics, secscrum, Security, Application Security Reviews

Jul 12 2024

mpopov closed T362874: Gather baseline data for time-to-data-collection, a subtask of T354965: [Epic] SDS 2.5 Establish baselines for Metrics Platform & Experimentation success indicators , as Resolved.
Jul 12 2024, 3:36 PM · Epic, Data Products (Epics Timeline)
mpopov closed T362874: Gather baseline data for time-to-data-collection as Resolved.

Morten completed this as part of SDS 2.5.7. Here is his report: https://docs.google.com/document/d/1HzEuDzEkwBtBexTlL5b4JiRjlhOP-dCBfAbckwMj4vc/edit#

Jul 12 2024, 3:36 PM · Product-Analytics (Kanban)
mpopov moved T369908: Estimate Community Updates module experiment sample size from Triage to Current Quarter on the Product-Analytics board.
Jul 12 2024, 1:50 PM · Product-Analytics (Kanban)
mpopov triaged T369908: Estimate Community Updates module experiment sample size as Medium priority.
Jul 12 2024, 1:49 PM · Product-Analytics (Kanban)

Jul 11 2024

mpopov added a project to T366222: Investigation: how many event participants have been affected by IP Blocks: Product-Analytics.

Status update: Product Analytics is going to own a hypothesis under WE 4.2, likely in Q2 (since we are fully booked for Q1), to the effect of:

If we develop a metric for measuring how many event participants are prevented from participating in events due to IP blocks, we will be able to measure baselines across different dimensions (such as geographic regions and languages) to understand the severity and distribution of the problem.

(To be refined by the hypothesis owner (TBD) with @kostajh as KR owner)

Jul 11 2024, 7:37 PM · FY24-25 WE4.2, Product-Analytics, IP-Blocking-Impacts, CampaignEvents
mpopov updated the task description for T369823: Evaluate GrowthBook and Statsig from Product Analytics perspective.
Jul 11 2024, 2:52 PM · Data Products (Data products Sprint 18), Product-Analytics (Kanban)
mpopov triaged T369823: Evaluate GrowthBook and Statsig from Product Analytics perspective as High priority.
Jul 11 2024, 1:58 PM · Data Products (Data products Sprint 18), Product-Analytics (Kanban)

Jul 3 2024

mpopov closed T341745: [PLACE HOLDER] BASELINES FOR QUALITY METRIC as Resolved.
Jul 3 2024, 8:37 PM · Product-Analytics (Kanban), Campaign-Tools, User-Iflorez
mpopov added a comment to T368326: Update Metrics Platform Client Libraries to accept experiment membership.

Collected various discussions & points into this decision brief (viewable by public). Will follow up here when the decision is made.

Jul 3 2024, 3:43 PM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform
mpopov awarded T362212: Calculate baselines for WE2.1/KE1 related to increasing the coverage of quality content in key topic areas a Love token.
Jul 3 2024, 12:08 PM · Product-Analytics (Kanban), LPL Analytics, OKR-Work, LPL Hypothesis

Jul 2 2024

mpopov created T369118: Wikipedia Preview can't be loaded from Unpkg since v1.9.0.
Jul 2 2024, 8:56 PM · Inuka-Team, LPL Essential (LPL Essential 2024 Jul-Sep), Wikipedia-Preview

Jun 28 2024

mpopov added a comment to T368678: Deprecate use of desktop- and mobilewebuiactions in Event Platform.

I believe mediawiki.web_ui_actions succeeds eventlogging_DesktopWebUIActionsTrackingand eventlogging_MobileWebUIActionsTracking
and mediawiki.web_ui_scroll_migrated succeeds mediawiki.web_ui_scroll

Jun 28 2024, 8:43 PM · Web Team Essential Work 2024, Data-Engineering, Event-Platform, Web-Team-Backlog
mpopov updated subscribers of T368678: Deprecate use of desktop- and mobilewebuiactions in Event Platform.

@KSarabia-WMF: The default is in the stream config (pasted above) and it's currently set to 20% of sessions on all projects. And what's happening with the migrated instrumentation is that it's being double-sampled – the instrument initializes on 20% of pageviews (1% on English projects) and when it produces events to the mediawiki.web_ui_actions stream, only 20% of sessions will actually submit an event.

Jun 28 2024, 7:37 PM · Web Team Essential Work 2024, Data-Engineering, Event-Platform, Web-Team-Backlog
mpopov added a comment to T368678: Deprecate use of desktop- and mobilewebuiactions in Event Platform.

Ensure that the sampling for this instrument is per session not per-pageview.

Jun 28 2024, 2:19 PM · Web Team Essential Work 2024, Data-Engineering, Event-Platform, Web-Team-Backlog
mpopov updated the task description for T368678: Deprecate use of desktop- and mobilewebuiactions in Event Platform.
Jun 28 2024, 2:19 PM · Web Team Essential Work 2024, Data-Engineering, Event-Platform, Web-Team-Backlog
mpopov awarded T368678: Deprecate use of desktop- and mobilewebuiactions in Event Platform a Love token.
Jun 28 2024, 2:01 PM · Web Team Essential Work 2024, Data-Engineering, Event-Platform, Web-Team-Backlog
mpopov updated the task description for T368674: Instrumentation for Community Wishlist.
Jun 28 2024, 1:39 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)
mpopov updated the task description for T368674: Instrumentation for Community Wishlist.
Jun 28 2024, 1:04 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)

Jun 27 2024

mpopov updated subscribers of T368674: Instrumentation for Community Wishlist.
Jun 27 2024, 9:05 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)
mpopov triaged T368674: Instrumentation for Community Wishlist as High priority.
Jun 27 2024, 8:49 PM · Community-Tech (Gray Fox (Aug 26 - Sept 6)), Community Wishlist, Product-Analytics (Kanban)
mpopov added a comment to T368326: Update Metrics Platform Client Libraries to accept experiment membership.

@phuedx: I reviewed the modeling approaches in PA team sharing (notes w/ link to transcript & recording) and Morten brought up a great point which is the computational cost of using virtual table generation (LATERAL VIEW EXPLODE() in Spark SQL and CROSS JOIN UNNEST() in Presto) for version 1 which reminded him of evaluating custom data approach of monoschema MP.

Jun 27 2024, 4:50 PM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform

Jun 25 2024

mpopov updated subscribers of T368326: Update Metrics Platform Client Libraries to accept experiment membership.

@phuedx: I see in the Figma designs that instruments/experiments will have user-provided unique names with machine-readable versions auto-generated. Will those instruments/experiments have numeric IDs in the database that also uniquely identify them?

Jun 25 2024, 10:04 PM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform
mpopov added a comment to T368326: Update Metrics Platform Client Libraries to accept experiment membership.

Andrew brought up an excellent point (in the same thread) which is that https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Complex_array_element_and_map_value_type_evolution_is_not_well_supported so if we did go the array-of-objects (version 1) route, we MAY want array<map<string,string>> not array<struct<id:int,name:string,group:string>> in case there is anything we would need to add later. Like, if we needed to include a new field for each experiment it'd be easy with the former and impossible with the latter.

Jun 25 2024, 9:38 PM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform
mpopov added a comment to T368326: Update Metrics Platform Client Libraries to accept experiment membership.

That's a great point!

Jun 25 2024, 5:30 PM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform

Jun 24 2024

mpopov added a comment to T366807: [EPIC] Update Metrics Platform Client Libraries to accept instrument name.

@phuedx @VirginiaPoundstone: T368326: Update Metrics Platform Client Libraries to accept experiment membership and just to confirm: yes I have bandwidth to consult on this

Jun 24 2024, 10:01 PM · Data Products (Epics Timeline), Epic, Metrics Platform
mpopov created T368326: Update Metrics Platform Client Libraries to accept experiment membership.
Jun 24 2024, 9:59 PM · Patch-For-Review, Data Products (Data products Sprint 18), Metrics Platform
mpopov closed T355182: Past edits increase in wmf.edit_hourly with every new snapshot as Resolved.

Sounds good, yeah! Thanks for closing the loop on this.

Jun 24 2024, 4:45 PM · Data Products, Data-Engineering

Jun 21 2024

mpopov added a comment to T366807: [EPIC] Update Metrics Platform Client Libraries to accept instrument name.

Oh cool! Yeah I'd love to squeeze that into scope. Should I make a separate task?

Jun 21 2024, 7:14 PM · Data Products (Epics Timeline), Epic, Metrics Platform
mpopov added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

@Milimetric Once this is finalized can you please document this change at https://wikitech.wikimedia.org/wiki/Data_Platform_Engineering/Decision_log?

Jun 21 2024, 4:59 PM · Data Products (Data Products Sprint 17), Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Jun 13 2024

mpopov added a comment to T366937: MPIC: Location and Sample Rate fields should be correlated.

As an Experiment Owner, when I select a location(wiki) I need to see the default sample rate for that location(wiki), so I know whether or not I need to modify it according to my experimentation plan.
As Experiment Owner, I need to override the default sample rate at a per wiki level, so I can collect data at different rates based on the various wikis that are part of the experiment.
As an Experiment Owner, I need to set a default sample rate, so that rate will be applied across all locations(wikis) except the ones that I need to override with a different sampling rate.

Should there be standard sampling rates per wiki? Or a low conservative default sampling rate (.1%) for all?

I think that providing defaults for these values is part of the product. I also think that it would be better for our users (and for the platform!) if we provided a conservative default value that the user than can increase (with relatively little friction because MPIC is gonna be _awesome_!).

Jun 13 2024, 2:26 PM · Data Products (Data Products Sprint 15), Wikimedia-Design, Design, Metrics Platform

Jun 11 2024

mpopov added a comment to T366627: [MPIC] Analyse risk of potential performance issues with static approach to stream configuration.

@MNeisler just caught me up on this. I just want to share some thoughts about

Jun 11 2024, 7:31 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Products, Metrics Platform
mpopov triaged T364406: Migrate IP masking dashboard data pipeline to airflow as Medium priority.
Jun 11 2024, 1:00 PM · Temporary accounts (Figure out Analytics/Instrumentation for Temp Accounts rollout), Product-Analytics (Kanban)

Jun 5 2024

mpopov updated the task description for T365813: Develop a unified Content Translation (CX) monitoring dashboard.
Jun 5 2024, 3:42 PM · Epic, LPL Analytics, Language-analytics, Product-Analytics
mpopov reassigned T364547: Year-end report on FY2023-24 KR WE2.1 from jwang to Iflorez.
Jun 5 2024, 2:09 PM · Product-Analytics (Kanban), FY2023-24-WE 2.1 Typography and palette customizations, Web-Team-Backlog

May 31 2024

mpopov added a comment to T365860: Understand the usage of template filters.

Kind of? Categories are…a giant unstructured, manually-maintained mess and inconsistent across languages. (In general, not just specific to templates.) Categories can also have layers upon layers of sub-categories nested within each other, and there's not an easy way to get them all.

May 31 2024, 9:58 PM · Product-Analytics
mpopov closed T365188: Portuguese and Spanish Wikipedia access for Mikhail Popov and Connie Chen as Resolved.

All good! Thanks so much!

May 31 2024, 9:28 PM · Search-Console-access-request
mpopov closed T365860: Understand the usage of template filters as Declined.

First, I want to acknowledge that I appreciate wanting to incorporate data into the design stage of product development cycle. Unfortunately we do not have capacity to take this request (or any request, really) for data analysis for the remainder of Q4.

May 31 2024, 5:27 PM · Product-Analytics

May 30 2024

mpopov raised the priority of T365860: Understand the usage of template filters from Medium to Needs Triage.

@JWheeler-WMF Is this for FY24–25 PES 1.2.3?

May 30 2024, 7:42 PM · Product-Analytics

May 29 2024

mpopov closed T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate. as Resolved.

Thank you for this analysis, KC!

May 29 2024, 1:25 PM · Product-Analytics (Kanban), Language-analytics

May 24 2024

mpopov added a project to T365586: Measure user impact of using a central login / signup page: Product-Analytics.
May 24 2024, 4:05 PM · Product-Analytics, MediaWiki-Platform-Team, SUL3, MediaWiki-extensions-CentralAuth

May 21 2024

mpopov added a comment to T363685: MinT for Readers: Implement instrumentation for key events .

@ngkountas @KCVelaga_WMF: Perhaps this will be useful: https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_An_Instrument#In_JavaScript

May 21 2024, 3:19 PM · MW-1.43-notes (1.43.0-wmf.20; 2024-08-27), LPL Essential (LPL Essential 2024 Jul-Sep), MinT

May 17 2024

mpopov updated the task description for T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.
May 17 2024, 1:30 PM · Data-Engineering (Q1 2024 July 1st - September 30th)

May 16 2024

mpopov renamed T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from [Data Quality] Implement completeness check for MediaWiki History to [Data Quality] Implement wiki completeness check for MediaWiki History.
May 16 2024, 9:12 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
mpopov created T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.
May 16 2024, 9:10 PM · Data-Engineering (Q1 2024 July 1st - September 30th)
mpopov added a comment to T365197: ISPDatabaseReader null pointer exception.

@CDanis: Can you please paste the Spark code / Spark SQL query you used for reproducibility?

May 16 2024, 8:58 PM · Data-Platform-SRE (2024.05.27 - 2024.06.16), Patch-For-Review, Data-Engineering
mpopov created T365188: Portuguese and Spanish Wikipedia access for Mikhail Popov and Connie Chen.
May 16 2024, 6:08 PM · Search-Console-access-request
mpopov changed the status of T304086: wmfdata-r v2 should mainly be a wrapper for wmfdata-py from Open to Stalled.
May 16 2024, 5:55 PM · Product-Analytics
mpopov created T365144: Application Security Review Request : Quarto.
May 16 2024, 1:14 PM · Product-Analytics, secscrum, Security, Application Security Reviews

May 15 2024

mpopov added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

@Krinkle: Thank you for sharing the results of your queries in a manner consistent with the Data Publication Guidelines.

May 15 2024, 2:32 PM · Data Products (Data Products Sprint 17), Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

May 14 2024

mpopov awarded T363616: Explore citations included with revisions by editor experience and revert rate a Love token.
May 14 2024, 7:31 PM · Wikimedia-Hackathon-2024
mpopov updated the task description for T364398: Add MW table 'cu_log' to data lake.
May 14 2024, 4:12 PM · Data-Engineering, Data-Platform

May 13 2024

mpopov updated the task description for T364547: Year-end report on FY2023-24 KR WE2.1.
May 13 2024, 8:24 PM · Product-Analytics (Kanban), FY2023-24-WE 2.1 Typography and palette customizations, Web-Team-Backlog

May 9 2024

mpopov triaged T363238: Create measurement plan and instrumentation spec for IP reputation instrumentation as Medium priority.
May 9 2024, 7:08 PM · Product-Analytics (Kanban)

May 7 2024

mpopov triaged T364398: Add MW table 'cu_log' to data lake as Medium priority.
May 7 2024, 3:34 PM · Data-Engineering, Data-Platform