Page MenuHomePhabricator

Add proper email bounce handling to MediaWiki (with VERP)
Closed, ResolvedPublic

Description

Author: lwelling

Description:
It's likely that many Wikipedia accounts have a validated email address that once worked but is out of date. We do not currently unsubscribe users who trigger multiple non-transient failures and some addresses might be 10 years old.

We should not keep sending email that is just going to bounce. It's a waste of resources and might trigger spam heuristics.

I'd propose adding two API calls.

One to generate a VERP address to use when sending mail from Mediawiki.

One that records a non-transient failure. That API call would record the current incident and if there had been some threshold level met, eg at least 3 bounces with the oldest at least 7 days ago, then it would un-confirm the user's address so mail will stop going to it.

For at least the second call, authentication will be needed so fake bounces are not a DoS vector or a mechanism for hiding password reset requests.

The reason for the threshold is that some failure scenarios will resolve themselves, eg mailbox over quota, so we don't want to react to one bounce. We want a history of consecutive mails bouncing.

There would be Mediawiki development component to this task to build the API, to add VERP request calls wherever email is sent, and an Ops component to route VERP bounces to a script (taking the mail as stdin, and optionally e.g. the e-mail address as arguments), which can then call the (authenticated) MediaWiki API method to remove the mail address.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=32770
https://bugzilla.wikimedia.org/show_bug.cgi?id=12767
https://bugzilla.wikimedia.org/show_bug.cgi?id=62838

Related Objects

StatusSubtypeAssignedTask
InvalidNone
ResolvedNone
InvalidNone
Declined chasemp
Resolved chasemp
OpenFeatureNone
Resolved01tonythomas
Resolved01tonythomas
Resolved01tonythomas
Resolved01tonythomas
Resolved csteipp
Resolved01tonythomas
InvalidNone
ResolvedNone
Resolved01tonythomas
Resolvedaaron
Resolved01tonythomas
ResolvedNone
Declined01tonythomas
Resolved01tonythomas

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:24 AM
bzimport set Reference to bz46640.

tchay wrote:

Luke, I think you just volunteered to create the API. :-) When it's ready, I can open a ticket on RT.

Luke, thanks for filing this (see also Brion's bug 12767 comment 2).
The issue makes the successful delivery rate of MediaWiki emails lower due to spam countermeasures and may block bug 56414, so I think it should be considered high priority now that we're sending more notifications compared to a few years ago.

(In reply to comment #2)

Luke, thanks for filing this (see also Brion's bug 12767 comment 2).
The issue makes the successful delivery rate of MediaWiki emails lower due to
spam countermeasures and may block bug 56414, so I think it should be
considered high priority now that we're sending more notifications compared
to
a few years ago.

Brion, do you think this bug could become a GSoC project? Who could mentor it?

Could be feasible, might want to check in with ops first -- I don't know what it would take to plug into the mail system and properly process bounces. (There's also the danger of fake bounces being used to disable someone's account, so that'll be fun to figure out. :D)

So, this bug report was filed right after a conversation about Echo (RT
#4785) in which I was the one to propose VERP, so we've actually ran a full circle now :)

Yes, we need to do VERP and we'll make that happen. We're not ready yet for making that change (= making API calls from the mailservers) as the mail infrastructure is about to be rebuilt, but I think catching up again in, say, 1-2 months time would be the right time for this.

From the mailserver side, we can run a script (preferrably one that doesn't need a full MediaWiki install on the system ;)) when emails to bounce-XXX addresses arrive.

The way to avoid fake bounces DoSing a user would be to use a bounce-<hash>@wikimedia.org return path address with <hash> either being a random, stored token or one that is the output of a symmetrical encryption function, encrypt(email, secret). I'm sure Chris Steipp will have multiple good ideas about that :)

(In reply to Faidon Liambotis from comment #5)

So, this bug report was filed right after a conversation about Echo (RT
#4785) in which I was the one to propose VERP, so we've actually ran a full
circle now :)

Hah! Good thing it was still on someone's radar then. Thanks for the reply.

Yes, we need to do VERP and we'll make that happen. We're not ready yet for
making that change (= making API calls from the mailservers) as the mail
infrastructure is about to be rebuilt, but I think catching up again in,
say, 1-2 months time would be the right time for this.

1-2 months from now would be great for GSoC (it's the deadline for students applications) if someone thinks this suits a GSoC work and unless you want this bug to be fixed *before* the summer by ops/platform.

From the mailserver side, we can run a script (preferrably one that doesn't
need a full MediaWiki install on the system ;)) when emails to bounce-XXX
addresses arrive.

The way to avoid fake bounces DoSing a user would be to use a
bounce-<hash>@wikimedia.org return path address with <hash> either being a
random, stored token or one that is the output of a symmetrical encryption
function, encrypt(email, secret). I'm sure Chris Steipp will have multiple
good ideas about that :)

Sounds like an endorsement/proposal for Chris to be a (co-)mentor? ;-) Chris, are you interested in soliciting students work in this area? If yes, who could be interested mentoring?

tchay wrote:

Considering it's been almost a year since we've worked on it, I'm okay with this waiting until a summer GSoC. VERP is one of those reasonable but weird co-ordination issues that falls between the cracks in our organizational structure here.

Should we mention this to Quim and co?

If this project requires coding and has mentors, and you want to see it moving forward with a GSoC student, then the time to push for a GSoC proposal is now. Please create an own section for this proposal at

https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Featured_project_ideas

Thank you!

I have gone through the comments and understood that the mailserver needs to implement VERP (http://en.wikipedia.org/wiki/Variable_envelope_return_path).

I would like to work on this feature as a GSoC project, if Nemo, Chris, Faidon or anyone is ready to mentor me.

I have fixed a few bugs (https://github.com/wikimedia/mediawiki-core/commits?author=tonythomas01), for MW core and am willing to dedicate my time to get this done.

(In reply to Faidon Liambotis from comment #5)

The way to avoid fake bounces DoSing a user would be to use a
bounce-<hash>@wikimedia.org return path address with <hash> either being a
random, stored token or one that is the output of a symmetrical encryption
function, encrypt(email, secret). I'm sure Chris Steipp will have multiple
good ideas about that :)

You would need a random IV, nonce/timestamp (prevent replay), and some sort of checksum (prevent tampering), but yeah, it's doable.

(In reply to Nemo from comment #6)

Sounds like an endorsement/proposal for Chris to be a (co-)mentor? ;-)
Chris, are you interested in soliciting students work in this area? If yes,
who could be interested mentoring?

Sadly, I probably don't have time to co-mentor this year. I'm fine advising of the design for security, but I've got too many things going on right now.

(In reply to Terry Chay from comment #7)

Considering it's been almost a year since we've worked on it, I'm okay with
this waiting until a summer GSoC.

You're "okay with this waiting"? Literally nobody was awaiting your approval. As I understand it, your only involvement with this issue is having hired the guy who filed this bug report (and who rather quickly thereafter left the Wikimedia Foundation...).

Good grief.

Please let's focus in what matters here and now:

(In reply to Tony Thomas from comment #9)

I have gone through the comments and understood that the mailserver needs to
implement VERP (http://en.wikipedia.org/wiki/Variable_envelope_return_path).

I would like to work on this feature as a GSoC project, if Nemo, Chris,
Faidon or anyone is ready to mentor me.

I have fixed a few bugs
(https://github.com/wikimedia/mediawiki-core/commits?author=tonythomas01),
for MW core and am willing to dedicate my time to get this done.

Thank you Tony for your interest in fixing this problem. Tony needs two mentors, one familiar with development and one familiar with ops. Terry, Faidon, if you want this becoming a GSoC project proposal, could you please find the right names in your teams? When it comes to outreach programs, it is now or in 6 months (soonest). Thank you.

Thanks Quim, Legoktm have told me he is ready to mentor me in the development part and last day Faidon agreed to help me with the Sys-Ops part. That would mean green to forward right ?

(In reply to Tony Thomas from comment #13)

Thanks Quim, Legoktm have told me he is ready to mentor me in the
development part and last day Faidon agreed to help me with the Sys-Ops
part. That would mean green to forward right ?

Just update the "possible projects" page (if you want; it's not mandatory) and submit your application when it's time. As of now WMF is not even officially part of GSoC yet, as far as I remember, but you'll have your official answers in due time.

also see https://rt.wikimedia.org/Ticket/Display.html?id=6933 and Jeff Green volunteered to work with Tony on the ops part of this

MZMcBride: talked to Tony, reset RT pass for him, made sure he can read his own ticket and login, added Bugzilla ticket link, added Jeff Green, added Quim Gil, commented on BZ, hope that helped:)

(In reply to Daniel Zahn from comment #16)
Thanks Daniel, MZMcBride. I will talk with Jeff on getting the proposal forward. Since its the time Wikimedia is migrating the mail server to new Data center, it would be the right time to get VERP implemented.

A discussion started regarding VERP scheme in an email thread, seems to make sense to move that discussion to here. So here the gist of the discussion so far.

Given a VERP address generally looks something like this:

bounce-{$key}@wikimedia.org

The prefix /^bounce-/ is used by the incoming MTA as a hook to route messages to the bounce processor, and $key is used by the bounce processor to figure out which wiki user is having delivery issues.

We need to prevent an attacker from spoofing bounce messages and causing mass unsubscribes. We can accomplish by making $key secret, and not a simple hash that can be reversed or guessed.

"something like an HMAC, with a secret key"

Tony, your proposal is still missing in Google Melange. Please submit it there as a draft linking to your wiki page. In any case, we will evaluate your proposal in mediawiki.org. Thank you!

(In reply to Quim Gil from comment #19)

Tony, your proposal is still missing in Google Melange. Please submit it
there as a draft linking to your wiki page.

Done. Public link: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/tonythomas01/5629499534213120

Labs project mediawiki-verp was created.

Change 138655 had a related patch set uploaded by 01tonythomas:
Implementing VERP functionality to alter Return Path

https://gerrit.wikimedia.org/r/138655

Change 138655 had a related patch set uploaded by 01tonythomas:
Added VERP functionality hook to core

https://gerrit.wikimedia.org/r/138655

Change 138655 merged by jenkins-bot:
Added VERP functionality hook to core

https://gerrit.wikimedia.org/r/138655

No more open patches associated to this bug; resetting status.

It seems that most of the work has moved to the BounceHandler extension: https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/BounceHandler -owner:l10n,n,z
If that's where you now want to get this done, please move this to "MediaWiki extensions" product. Otherwise, split from this report to a new bug what's not going to happen in core.

Updates:
*) http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:EmailUser sending VERPed emails after https://gerrit.wikimedia.org/r/#/c/141287/.
*) Bouncehandler router and transport added to beta/prod to handle incoming bounces, after patch https://gerrit.wikimedia.org/r/#/c/155753/. The new router fetches and routes all VERP bounces via 2 steps of checks ( a regex for the VERP pattern, and a check on the domain ), and POST to the 'bouncehandler' API in beta.
*) Currently, the configuration is suspended in prod, as we need to test at-least for 2 weeks watching table 'bounce_records' in beta, where the extension is installed, and bounces are handled properly. We plan to take it to loginwiki in the next level - as deploying into prod.

  1. Beta : deployment-mx routing mails from beta, and bounces from beta successfully being handled.
  2. Production : bouncehandler installed in group0 wikis and bounces from group0 routed to API of test2wiki and successfully being recorded in bounce_records.

To Do:

  • We need to hit further large wikis!
  • UnConfirmEmail is not yet enabled

So the only decision to make is whether to first expand and then enable email unconfirmation; or first enable that and then expand to more wikis. At any rate I'd at least start unconfirming emails on group0.

gerritbot subscribed.

Change 189316 had a related patch set uploaded (by 01tonythomas):
Un-subscribe frequently failing recipients

https://gerrit.wikimedia.org/r/189316

Patch-For-Review

Change 189316 merged by jenkins-bot:
Un-subscribe frequently failing recipients

https://gerrit.wikimedia.org/r/189316

Un-subscribe feature just got enabled in group0 and we could successfully test the same, thanks to our TestVerp account in mediawiki.org. The current failure threshold is kept to 5 bounces in 7 days, to match mailman configs.

Next step:
Which wiki group next ?

Next step:
Which wiki group next ?

I propose to do all multilingual/backstage wikis (Meta-Wiki, mediawiki.org, Commons, Wikidata etc. etc.).

Next step:
Which wiki group next ?

I propose to do all multilingual/backstage wikis (Meta-Wiki, mediawiki.org, Commons, Wikidata etc. etc.).

Which is basically "group1" as defined in our deploy process: https://wikitech.wikimedia.org/wiki/Deployments/One_week#Three_wiki_groups

I'd like some confirmation that all is going well/no hiccups before we proceed, but otherwise I think that means we should be able to do this the week of March 2nd.

Change 191937 had a related patch set uploaded (by 01tonythomas):
Added BounceHandler extension to group1 wikis

https://gerrit.wikimedia.org/r/191937

Change 191937 merged by jenkins-bot:
Added BounceHandler extension to group1 wikis

https://gerrit.wikimedia.org/r/191937

Some updates after installing BounceHandler to group1 :
11:07 PM <Jeff_Green> there are 348 rows in bounce_records
11:08 PM <Jeff_Green> 6 with br_timestamp > 20150313000000
11:12 PM <Jeff_Green> there have been unsubscribes
11:12 PM <Jeff_Green> yeah 22 so far

Looks like we're going cool. How about planning a deployment to the wikipedia's ?

Looks like we're going cool. How about planning a deployment to the wikipedia's ?

Make a task describing that next step (group2 wikis, I presume) and we'll put it on the #-roadmap project to schedule the rollout (I didn't want to associate that project here).

Some more interesting updates, after installing it to group1 as on March 20
6:33 PM <hoo> mysql:wikiadmin@db1029 [wikishared]> SELECT COUNT(*) FROM bounce_records;
6:33 PM <hoo> 36823

https://gerrit.wikimedia.org/r/#/c/198220/ will install the extension everywhere.

Installed in all Wikipedia's as of now after https://gerrit.wikimedia.org/r/198220. We have roughly 37k entries in 'bounce_records' already!
Can be closed, I think!

Excellent work! Congrats and thank you for all the hard work, Tony. :-)

In 1.23 branch, the $wgPasswordSender setting seems to be ignored. In 1.24, it is honored, and this change is the one that fixes it.

Should this change be back-ported into the LTS branch?

In 1.23 branch, the $wgPasswordSender setting seems to be ignored. In 1.24, it is honored, and this change is the one that fixes it.

I couldnt find any latest changes in https://github.com/wikimedia/mediawiki-extensions-BounceHandler/commits/master which touch $wgPasswordSender :( Can you specify the patch ?

The patch is this one, https://gerrit.wikimedia.org/r/#/c/138655/

Sorry, I didn't realize there was more than one patch in this series.

I don't really understand why it fixes the issue, I think it is due to:

$extraParams .= ' -f ' . $returnPath;

And that the UserMailerChangeReturnPath hook itself is not relevant.

I'm using the built-in php mail and it talks to a generic postfix service on localhost, with CentOS 6.5.

The patch is this one, https://gerrit.wikimedia.org/r/#/c/138655/
Sorry, I didn't realize there was more than one patch in this series.

Actually, before the patch, the 'Return-Path' header was set to the value of $wgPasswordSender for emails from the wiki - and this patch come to relevance only if you have installed the BounceHandler ( mediawiki.org/wiki/Extension:BounceHandler ) extension to generate custom VERPed 'Return-Path' address on sent emails.

I don't really understand why it fixes the issue, I think it is due to:

$extraParams .= ' -f ' . $returnPath;

That would set our new VERP address as the 'Return-Path' address of outgoing emails through PHP mailer [1]

And that the UserMailerChangeReturnPath hook itself is not relevant.

Its used by the BounceHandler extension

I'm using the built-in php mail and it talks to a generic postfix service on localhost, with CentOS 6.5.

If you want to have a custom return-path in your sent emails, maybe you should pass that address as the 5th param ( using -f ) to php mailer send command ( as above )

[1] http://stackoverflow.com/questions/179014/how-to-change-envelope-from-address-using-php-mail

To paste some stats:

More stats, and it looks like our bounce rates are slowing down

Here is a graph of the unsubscribe rates/day https://docs.google.com/spreadsheets/d/1rL73QE0NIZfrD83sNEStV-U3vhFeVVLIYl8m5Tbo-JM/pubchart?oid=2021924574&format=interactive

image.png (371×600 px, 17 KB)

Few spikes from https://phabricator.wikimedia.org/P573 are removed as that was to make the graph show latest data.

Few statistics for unsubscribes from 1 Jan 2016 to 23 March 2016 at https://phabricator.wikimedia.org/P2805