Wikipedia talk:Flagged revisions/Trial/Proposed trials/Archive

A Proposal for Two Wikis and a Print Edition

I just posted a proposal at Wikipedia:Village pump (policy)#Two Wikis: An Alternate Proposal to Flagged Revisions. This proposal suggests two concurrent Wikis, one as is, one with select quality protected entries and a print edition. Tim Foyle (talk) 10:26, 27 January 2009 (UTC)[reply]

Flagged Revisions?

What exactly are 'flagged revisions?' I realize that I'm virtually inviting flames for displaying my ignorance, but having read the entire page, as well as the related Flagged Revisions/Trial page, I still am not sure. Gathering from what people have been discussing (and reading between the lines a bit) it seems like 'flagged revisions' would be edits that are not immediately displayed, but rather need manual (i.e. human) approval before appearing on the page. (I assume this is the "sight" that is being referred to...strange verbalization though.) Anyway, it might not hurt to explain a bit of the "what" before you go so deeply and complexly into the "how." RlndGunslinger (talk) 12:49, 6 January 2009 (UTC)[reply]

Your conclusion is absolutely right. Edits will be "delayed" and displayed to non-registered users only after approved (sighted) by a reviewer. You can read the main page of FR:Wikipedia:Flagged revisions. It contains links to software manuals and to other implementation proposals. Ruslik (talk) 13:22, 6 January 2009 (UTC)[reply]

Thank you for the link; I didn't see one on the trial page, so I didn't think there was one. (For the record I did try searching first, but I got a redirect to the mainpage for the Wiki project...) Thank again! RlndGunslinger (talk) 13:48, 6 January 2009 (UTC)[reply]

Just as a clarification, non-registered users will by default see the approved (sighted) version, but are only one click away from viewing the current draft copy and if they decide to edit the page, it will be the most current draft copy that will be presented for editing, not the sighted copy. Dbiel ^(Talk) 04:37, 20 February 2009 (UTC)[reply]

New Trial?

Can we add a trial of the top 500 most vandalized pages? Especially ones that are not already semi-protected? I cite pages for high-schools, etc... -- Mjquin_id (talk) 04:57, 8 January 2009 (UTC)[reply]

How do you plan to rank pages according to "most vandalized" ? Just a question. Kushal (talk) 14:07, 2 March 2009 (UTC)[reply]

Reviewers

Where would I propose that "reviewers" should be registered members of projects? -- Mjquin_id (talk) 05:06, 8 January 2009 (UTC)[reply]

I agree that more people should automatically be given the right to be reviewers. I boldly started a discussion on reviewers here, you might want to complement what I wrote, or write a separate paragraph. Nicolas1981 (talk) 05:30, 9 January 2009 (UTC)[reply]

Spelling

Is it my imagination or does the page use the word "sight" in many places where "cite" is correct? —Preceding unsigned comment added by Prescod (talk • contribs) 20:59, 9 January 2009 (UTC)[reply]

No, that's just a poor choice of terminology. A "sighted revision" is a revision that has been seen. A better word would be "flagged revision", I think, just like we call the system. But somehow we've adopted the weird terminology of "sighting" and "sighted revisions". Ozob (talk) 04:58, 10 January 2009 (UTC)[reply]

Proposal for first two trials

This seems like the place to be discussing this, but if we are going to do it somewhere else that's fine with me and this comment can be moved wherever the discussion ends up.

Jimbo has asked that flagged revs be turned on for a trial run and that seems to be the way we will go, so now the next step is to figure out what kind of trial or trials we will run. I'm going to recommend two from the list on this project page and hope we can discuss that.

Most of the discussion around flagged revs has revolved around two possible applications: as a replacement for semi-protection, and as a way to deal with our BLP problem. I propose that the first two trials address those possible applications.

Let's start small and begin with something akin to Trial 10, flagged protection. According to Category:Semi-protected there are nearly 700 pages that are semi-protected, though presumably there are others which were not added to the category. Let's enable flagged revs on half of those (I don't think it's necessary to ask the protecting admin if we make a community decision) and leave the other half semi-protected. We'll set up some criteria for evaluating the experiment vs. the control group, and after a couple of months we'll end the trial and decide if FR is better than semi-protection. Because we're dealing with a small set of articles this is a nice way to get our feet wet with the whole flagged revs process.

Barring a major disaster with that trial, we'll also assume we'll run another one with BLPs per the description in Trial 3. We could wait until the semi-protection/flagged protection trial completes, or if that is going fairly smoothly we could start the BLP trial one month into the first one to save some time. We would enable a small subset (several thousand) of BLPs with FP. It would probably be good to add our most problematic BLPs (ones with OTRS tickets etc.) to this first subset. We run the trial for two months and then evaluate. We probably would want to compare it with a control subset of BLPs which would also be established in advance and see if flagged revs resulted in less BLP harm and we'd obviously need to develop a criteria for doing that.

After these trials conclude we would discuss further trials, the possibility of permanently enabling flagged revs on certain articles, etc. These trials would very much be a first step.

I hope we can discuss this and other proposals here or on some other centralized community page.--Bigtimepeace | talk | contribs 12:14, 23 January 2009 (UTC)[reply]

I'm okay with trial 10 because I feel it will actually help IP editors. Oppose trial 3 as the selected pages were arbitrary selected. Sceptre ^(talk) 13:16, 23 January 2009 (UTC)[reply]

I should point out: any trial must have a specific end date, so the community can evaluate the success of the trial. I do not want implementation to become a fait accompli like restricting anons from article creation. And if it passes or not, the community must show a strong consensus (at least two thirds) to implement any changes. These are all standard procedures anyway, but given how damaging this can be in the worst case scenario (participation to Wikipedia declines sharply because they're all chilled off), we need to make that abundantly clear. Sceptre ^(talk) 19:11, 23 January 2009 (UTC)[reply]

Can I suggest that our metric is some comparison between a metric obtainable from the edit history (IP contributions, vandalism etc) for the selected articles, and the same metric during the trial? The reason is to prevent selection bias - how can we compare articles with very different levels of traffic. It's doable, but probably more appropriate if we make comparisons on the same articles being trialled Fritzpoll (talk) 13:36, 23 January 2009 (UTC)[reply]

The proposed trials as written are not fully satisfying in view of what we intend to use Flaggedrevs for, I'll comment later on this. The real discussion just begins. As for semi-protected pages, count also Category:Wikipedia semi-protected pages (1227 pages, most articles), Category:Wikipedia pages semi-protected against vandalism (499), Category:Wikipedia indefinitely semi-protected pages (516), etc. The categories are divided because the job queue is so damned long that category updates fron templates have been delayed since the end of December. A Category:All semi-protected pages would be useful, but it wouldn't be populated before a long time. Cenarium (Talk) 14:19, 23 January 2009 (UTC)[reply]

Thanks Cenarium for links to the other semi-protect categories (I know there had to be more pages than that). Also I like Fritzpoll's suggestion to run a comparison of, say, 2 months of editing on articles X, Y, and Z before flagged revs and 2 months of editing on the same articles after flagged revs were enabled. There might be other variables we want to compare which are not in the edit history but I'm not sure what that would be offhand. If we were going to do some kind of BLP test I think we'd want to figure out how much more harmful BLP stuff flagged revs kept out, and that might be difficult to quantify. Anyhow as Cenarium says this is just the beginning of a discussion on how we go about doing these trials. We might need to do something to insure that this becomes a wider community discussion, though I'm not sure what the best way to go about doing that is. --Bigtimepeace | talk | contribs 18:00, 23 January 2009 (UTC)[reply]

Special:ProtectedPages is a dynamic list of pages with various levels of protection, which also catches those not tagged with pp templates. I agree that prior edit history would be a useful comparison, although it introduces new systemic biases in how edit patterns might have changed over the entire encyclopedia over time. For that reason, we would probably be best in taking four samples of edits: from a control group and a live group, from before the trial and from during the trial. The differences between the old-flagged and old-unflagged correlate to any bias between new-flagged and new-unflagged, and the differences between old-unflagged and new-unflagged correlate to any time-dependency. Downside, of course, is that there is vastly more data to analyse. Happy‑melon 18:11, 23 January 2009 (UTC)[reply]

Before we commit to collecting certain kinds of data, it would be good to get the input of some of the people at Wikipedia:WikiProject Statistics. I'm sure there's someone there who would be willing to volunteer their professional expertise. Ozob (talk) 22:13, 23 January 2009 (UTC)[reply]

Yeah, we'll need some folks with the ability to run fancy smancy programs and all that, but I think we have a fairly large number of those and their input will be key as to what is feasible in terms of gathering and processing data. Personally that kind of thing is totally outside of my realm. --Bigtimepeace | talk | contribs 01:07, 24 January 2009 (UTC)[reply]

A slightly different approach for a trial, might be to 'protect and feature' quality entries that are well researched and excellently referenced. It would be interesting to see what happens when entries are selected on a basis of quality, rather than because they are prone to vandalism. If the aim is to discourage vandalism of unprotected pages by queuing changes for approval, than maybe vandalism could be discouraged by rewarding good entries instead of penalising new contributors. Protected pages that are prone to vandalism could be featured if serious contributors have worked very well under awkward circumstances to present accurate work.Tim Foyle (talk) 11:06, 27 January 2009 (UTC)[reply]

So how are we measuring whether FR is "more welcoming"?

I personally have not got a clue how anybody proposes to test whether FR is seen by IPs as making Wikipedia more or less welcoming, especially if as looks likely it will be applied to more than just semi protected articles (if not, the recent Kennedy fake death 5 minute vandalism/libel that has seemingly triggered the approval for FR trials would not have been "100% preventable"). The only thing I can think of some kind of feedback survey. I'm certainly not going to be supporting FR on any frankly POV assertions that it "just is/will be". So, sugestions? MickMacNee (talk) 15:22, 23 January 2009 (UTC)[reply]

FlaggedRevs incorporates a "comments" functionality, allowing readers to leave quick comments on articles. We could easily commandeer this sytem to comment on the process as a whole. Happy‑melon 18:12, 23 January 2009 (UTC)[reply]

I don't understand what you are suggesting with "commandeer this sytem to comment on the process as a whole"—can you elaborate?--Bigtimepeace | talk | contribs 18:56, 23 January 2009 (UTC)[reply]

Well one "obvious" measure would be the number of IP edita (propably minus vandalism) on the tested articles. If they go significantly down, then obviously IPs consider it as unwelcoming, if they stay in the same ballpark, all the worries might have much ado about nothing.--Kmhkmh (talk) 15:47, 27 January 2009 (UTC)[reply]

Pre-Trial

I believe that we should actually start with a Pre-Trial. That is set up a few test pages so that we can develop the necessary templates and customize the software. After that is done, the move on the the first live test. Until the software option is turn on, none of the necessary customization can even be started. Dbiel ^(Talk) 16:19, 23 January 2009 (UTC)[reply]

This is an excellent idea. Several messages in the mediawiki namespace will need to be customised for clarity and consistency. —AlanBarrett (talk) 18:55, 23 January 2009 (UTC)[reply]

It would be good to have a sandbox set up for this purpose. Something like Wikipedia:Flagged revisions sandbox? I don't know whether this is technically desirable, since it wouldn't be in the article namespace; but on the other hand we don't really want to set up test pages in the article namespace, either. Ozob (talk) 22:16, 23 January 2009 (UTC)[reply]

The pre-trial could involve one (or a small number of, say five) actual articles, fairly unimportant ones, though not extremely unimportant. ☺Coppertwig (talk) 00:49, 24 January 2009 (UTC)[reply]

Flagging but not hiding: at least in first trial

I believe we can turn on flagged revisions but still allow everyone to see the most recent version as the default, as they do now. I suggest doing it this way at least for the first trial, to minimize possible negative effects of a trial.

What use is flagged revisions if people still see the most recent version, one might ask? First of all, it may make it easier to check for vandalism, similarly to the way the yellow highlighting in new page patrol helps guide the patrollers to avoid duplicated effort. (I'd really like my watchlist to have little symbols showing which of the changes have been sighted.) Secondly, a link would be available to allow people to look at the "sighted" version if they choose to, and people could set their personal default to display sighted versions.

If nothing else, doing it this way may make the first trial more palatable to the community, while still allowing information to be collected about how it works in practice. ☺Coppertwig (talk) 00:06, 24 January 2009 (UTC)[reply]

Oppose this as just adding confusion. The positives to this idea are pretty minor and give little incentive for anyone to do any flagging. All the "final implementations" that have been seriously discussed here use flagging to control the default "public" view, so I don't see the point of a trial that doesn't do that. (de:wp did try this "transparent" configuration for a month or two but decided to revert to the more obvious implementation in their final vote; I can't read German, so I don't know what the arguments were). PaddyLeahy (talk) 01:13, 25 January 2009 (UTC)[reply]

All the "final implementations" is not currently discussed, only Flagged protection, but interesting point, i'll try to find the discussion on de.wiki for the arguments. Mion (talk) 01:24, 25 January 2009 (UTC)[reply]

The incentive to flag (when it doesn't affect default display) is as a means of communicating with other editors to save duplicated effort. Essentially, flagging is a message meaning "don't worry: I've checked this edit; it's OK." Other RC-patrollers and watchlisters can then (if they wish) focus their attention on other edits that have not been looked at. It's just like the yellow bars on new page patrol. ☺Coppertwig (talk) 18:03, 25 January 2009 (UTC)[reply]

I see this is a good idea. The "live" version should be default. Unregistered users should be given a brief message where they can choose which version they want to see or that they can create an account which will make this choice permanent (as opposed to length of cookie). The again I always favored the quality articles proposal over sighted articles one. I think FlaggedRevs is best for marking community consensus on good articles and making sure they don't deteriorate. gren グレン 07:11, 26 January 2009 (UTC)[reply]

Excuse me, Gren, but I don't understand this "... I always favored the quality articles proposal over sighted articles one." What does that mean? Thanks, - Hordaland (talk) 14:33, 26 January 2009 (UTC)[reply]

I was referring to this proposal, quality versions, as opposed to the sighted versions proposal being discussed here. The main difference is I see only minimal use from sighting vandalism free articles but I see a lot of use making sure our Featured, class A, and good articles don't degrade. The dirty hands of hundreds of editors can change a wonderful prose of a featured article into a 4th grader's paper. The idea in my proposal would be every X days (depending on the project, the attention, how much it changes) you have editors come to a consensus if they changes within the last X days have been an improvement to the article. Anything that improves the article is kept and the flag is updated... anything that makes it worse is removed. Sighted would be the worst level of that and Featured Article would be the best level... and obviously there would need to be a stronger consensus to update the flagged revision that made a featured article better than there would be for other things. gren グレン 14:51, 26 January 2009 (UTC)[reply]

Why is this proceeding?

The poll closed with 59% support. That is not a consensus, so why are you still proceeding with the implementation of this trial? --G W… 22:16, 24 January 2009 (UTC)[reply]

Why does it not represent a consensus? By that I am implying nothing more than the obvious: given that you present absolutely no support for that bare assertion, it means precisely nothing. While the reasons why 59% does not represent a consensus might be clear in your mind, that's no use to anyone else. Happy‑melon 22:25, 24 January 2009 (UTC)[reply]

Because our great leader deemed it so. A compromise poll is up at WT:Flagged Protection, which is at 96% at this time. NuclearWarfare _(Talk) 22:33, 24 January 2009 (UTC)[reply]

Consensus is not a majority vote. Most of the votes -- on both sides -- didn't present their arguments as if they actually read what they were voting on, if they said anything at all. The ones that did tended to be in the support side. ♫ Melodia Chaconne ♫ (talk) 23:15, 24 January 2009 (UTC)[reply]

wow, i hope this doesn't ever get implemented. i know i'll never donate to the wiki again. the fact this change was even discussed seriously depresses me. --Wongba (talk) 01:58, 25 January 2009 (UTC

It was a straw poll which means that no result is binding it is an expression of thought. If the "vote" had gone the other way the fact that it was a straw poll would have been pointed out many times. Sorry but I made the time to read the proposals (like many )and expressed my thoughts on them knowing that the result was not binding. It is also interesting to note that the straw poll has become a poll (and that includes you Jimbo Wales) just by dropping a word. But what is a word amongst friends.Edmund Patrick – confer 19:53, 26 January 2009 (UTC)[reply]

I agree, I'll never donate to a regular encyclopedia (e.g. the proposed new Wikipedia), after having experienced a free-content one (e.g. the existing Wikipedia). I read that Jim Wales has invited alternate suggestions to replace his intention to implement flagged revisions. My suggestion is to apply for a job at Brittanica. The concept of a freely edited encyclopedia was fine until it got big, and now it needs to change to be more like "real" encyclopedias? I never encountered a college professor who accepted Wikipedia as a citation, so I don't see what harm there is in the *continued* (i.e. not new) potential for misinformation. If people want to be sure of things, they can follow references. If they don't, they're no worse off than they would be reading any other website. If revisions have to be approved, it might as well be any other website.

--Badmuthahubbard (talk) 20:05, 26 January 2009 (UTC)[reply]

59% might not be consensus, but i think such a majority is enough to go forward with a trial to get a better understand of how it plays out. The final decision after the trial however would require a real consensus (i.e. >= 66% of the vote).--Kmhkmh (talk) 15:51, 27 January 2009 (UTC)[reply]

A consensus is not based on headcount, but on arguments and their relative strengths. We have to stop trying to link arbitrary percentages to gaining consensus. If you think we need a supermajority vote, say so! :) But remember that consensus isn't voting! Fritzpoll (talk) 15:56, 27 January 2009 (UTC)[reply]

How can this operate without stopping Wikipedia from working

If it takes three weeks to approve an edit how will anyone know what they are editing. If on day one someone writes an article, and someone edits it on day 2, 3, 4 and so on. By the time day 21 is reached someone will be editing an article that no longer exists. Is the person doing the approval going to consider all outstanding edits at once? How will they deal with conflicts? especially if they are less expert than the edit writers? This only seems to be a problem in BLP, and only a few of those. Why not identify some categories, say politicians, bankers, persons accused of crimes (how many people will be in all three!) that need monitoring and leave it at that. Then there may be some hope of keeping up in a timely manner (say 1 hour). No-one will want to check my obscure edits of railroad pages.--Wickifrank (talk) 14:06, 27 January 2009 (UTC)[reply]

Because when someone else goes to edit, they will be editing the latest version no matter what. They will also see the diff. The only edit conflicts will be exactly the same as the system is now. ♫ Melodia Chaconne ♫ (talk) 14:10, 27 January 2009 (UTC)[reply]

Understood, but won't it waste a lot of time. You see the page, go to edit it and find someone has already done it. The time might not cost Wikipedia anything but that does not mean it is not valuable to the writer.--Wickifrank (talk) 17:15, 27 January 2009 (UTC)[reply]

There's also no guarantee that it will take as long as three weeks to flag an edit. The Germans have a handful of articles in the queue that long, but most get flagged very quickly. What's also worth bearing in mind is that there are no plans at present to enable FlaggedRevs over all articles, just limited subsets, so the backlogs should be more manageable. For example, de-wp has 800,000 flagged articles and a smaller userbase; if we enabled FlaggedRevs for BLPs (of which there are 330,000) with our larger userbase, the backlogs should be less of a problem. We simply won't know until we've tried. Fritzpoll (talk) 14:22, 27 January 2009 (UTC)[reply]

59% might not be consensus, but i think such a majority is enough to go forward with a trial to get a better understand of how it plays out. The final decision after the trial however would require a real consensus (i.e. >= 66% of the vote).--Kmhkmh (talk) 15:51, 27 January 2009 (UTC)[reply]

A consensus is not based on headcount, but on arguments and their relative strengths. We have to stop trying to link arbitrary percentages to gaining consensus. If you think we need a supermajority vote, say so! :) But remember that consensus isn't voting! Fritzpoll (talk) 15:56, 27 January 2009 (UTC)[reply]

Comments on specific proposed trials

Trial 1: Featured articles/portals

I think this is a right-minded approach. Featured articles are both highly visible and outstanding: perfect targets for vandalism without much room for improvement. But I don't support the requested privileges clause. I strongly feel that all user accounts x days old should have sighting privileges; in fact, I don't support any trial that doesn't provide a very large pool of sighters. Estemi (talk) 17:31, 26 January 2009 (UTC)[reply]

I think this would be the best way to start a test of it, if we introduce it as a blanket across all articles, the articles that are little noticed, when edited by an anon/newbie will not get checked for quite a while. I also don't understand how users will be alerted to revisions needing to be checked. Of course for watched articles this wont be a problem but for unwatched articles? Harland1 (^t/_c) 18:06, 26 January 2009 (UTC)[reply]

There is a special page, visible to "reviewers", which lists all pages with an unflagged revision pending. So any small-scale trial would be trivial to monitor: in fact the trick is not to make it too easy: we need to have a small number of trial reviewers if there are a small number of trial pages. PaddyLeahy (talk) 00:29, 27 January 2009 (UTC)[reply]

Indeed we would need to have number of trial reviewers proportional to the number of pages. presumably if it were introduced across all articles one would become a reviewer automatically after a certain land mark? As any checking process would take too long for all the thousands of users, but vandals can often amass significant numbers of edits?Harland1 (^t/_c) 17:57, 27 January 2009 (UTC)[reply]

FAs, particularly of BLPs, may still track topics that change frequently enough that this seems like a bad idea to do universally. Perhaps it could be limited to those FAs and GAs already under a semi-protect status where that status still doesn't prevent all vandalism. This would result in a small enough pool that it would minimize the overall damage done to the project by implementing this propsal, although nothing short of just simply not-doing-it will completely mitigate said damage. MrZaius^talk 05:17, 29 January 2009 (UTC)[reply]

Trial 2: Semi-protected articles

Trial 3: Subsection of Biographies of Living Persons

Trial 4: Per-article opt-in

Trial 8: Half of Wikipedia

There is clearly a consensus against enabling FlaggedRevs on all of wikipedia at present, hence there is no point in this trial. PaddyLeahy (talk) 08:58, 27 January 2009 (UTC)[reply]

Agreed Fritzpoll (talk) 09:00, 27 January 2009 (UTC)[reply]

Trial 9: Logistical test

This is a fairly sensible way of going about starting any of the other trials, but it is ridiculous to suggest stopping and resetting everything at the end. The average user would find it Not Funny to have all those site notices and then at the end be told "oh, no actually we were only testing how to turn it on". BTW no supporters of FlaggedRevs that I've heard of are suggesting any kind of formal process like RfA to get reviewership: too slow even if we could get people to be much less fussy. For the trials, where admins hand it out manually, it should work like getting rollback (i.e. just ask). This would already give far fewer reviewers than most of the proposed automatic criteria (e.g. 1 month/300 edits). If we want to limit the numbers further, we could hand it out only to usernames beginning with certain letters (e.g. A-E). PaddyLeahy (talk) 09:37, 27 January 2009 (UTC)[reply]

Trial 10: Flagged protection

This requires a different configuration of FlaggedRevs than the one favoured by the recent straw poll. It involves very few articles (0.1% of Wikipedia) and the most reviewers (all autoconfirmed users) so it at one extreme of the protection vs openness range. I propose we put this on one side until we've done some of the other trials. PaddyLeahy (talk) 09:09, 27 January 2009 (UTC)[reply]

And WP:Flagged protection may be sufficiently popular to be instituted without trial. But I would argue the other way; we should do any necessary trials of FP first, and then the others. If this proves overwhelmingly popular, it will help the others; if this fails, all the others are hopeless.

Er, as I read the straw poll, FP was favored by it; a large number of users said it was the only or most acceptable variant. Septentrionalis PMAnderson 16:53, 27 January 2009 (UTC)[reply]
No, it would make more sense to do this one first. It makes sense to do trials involving a small number of articles before doing things that might affect (possibly adversely) large parts of the wiki. I think the "flagged protection" idea is looked on favourably by a larger part of the community than some other schemas.
It says, "The control group is another list of 300 semiprotected articles; again, notify the protecting admins..." To make the test statistically valid, the way to do it is to ask permission from the admins for 600 articles, then afterwards randomly choose which of the articles (among those for which we have permission) will be in the flagged trial and which will be in the control group. This could be done in a pseudorandom manner: for example, after making a list of articles for which permission was obtained, even-numbered items in the list could be flagged and odd-numbered ones could be control; I think this would probably be good enough and possibly easier to implement conveniently and transparently on a wiki, although real randomization would be preferable. ☺Coppertwig (talk) 17:35, 22 February 2009 (UTC)[reply]

Trial 13: Three month trial of all BLPs flagged protection

My suggestion. Comments? --Apoc2400 (talk) 19:00, 26 January 2009 (UTC)[reply]

As a pro-flagging BLP "hawk," I think this is an acceptable proposal for the trial. The three month criteria/100 edits default for sighting is sufficient, and I think the ability for admins to add or remove the privilege is crucial. I've thought a lot about auto-timeout proposals, and I think I can live with a one week limit. It gives BLP hawks like me a strong incentive to never let the backlog exceed one week. Some people have objected to the privilege being tied with rollback, but I don't think that's a big deal—as long as admins can easily revoke it. Cool Hand Luke 19:45, 26 January 2009 (UTC)[reply]

Support, but only as a trial. Sceptre ^(talk) 22:40, 26 January 2009 (UTC)[reply]
Absolutely oppose This is not a trial; there is no control group, and the sample is too large ever to be evaluated. Septentrionalis PMAnderson 23:42, 26 January 2009 (UTC)[reply]

How about if it is on BLPs beginning with A-M and admins are only allowed to flag-protect articles from A-M? Articles N-Z is the control group. --Apoc2400 (talk) 23:54, 26 January 2009 (UTC)[reply]

Still much too large. Septentrionalis PMAnderson 00:33, 27 January 2009 (UTC)[reply]

For once I agree with PMAnderson. Start small (trial 3). Bear in mind that "turning FR on" for some set of articles does not mean that the current versions are all flagged good—that would defeat the object. It means we can start checking each one for minimal compliance with WP:BLP. Reviewers can start with articles they already know well, but there will be plenty of BLPs where the main authors are unwilling or unable to review and those will need at minimum a careful read of (one recent revision of) the whole article, preferably done in a good library so you can look stuff up. Once an article contains a recent flagged revision things get much easier, because you can just look at diffs. PaddyLeahy (talk) 00:52, 27 January 2009 (UTC)[reply]

A very small trial before this might make sense. I am not sure what it is supposed to show, but at least we get to try out the feature on real articles. --Apoc2400 (talk) 01:01, 27 January 2009 (UTC)[reply]

In a trial with a small subset of articles we would have much less sighting work, but with just as many sighters, leading to artificially short sighting queues. To me, edits not getting sighted reasonably quickly is the main concern with flagged revisions. --Apoc2400 (talk) 01:01, 27 January 2009 (UTC)[reply]

The point of a trial is to consider the effects of FR and see whether they are better than what we do now. To find that out, somebody has to evaluate the changes on the trial sample and the control sample, which means somebody has to read the diffs quite carefully. That may get done for hundreds of articles; it won't get done for thousands. Septentrionalis PMAnderson 02:59, 27 January 2009 (UTC)[reply]

Considering that the leading objection to FR is the supposed backlogs, I think the trial should be in the same ballpark of what we're thinking about possibly implementing. Half of BLPs is a good trial. Cool Hand Luke 03:47, 27 January 2009 (UTC)[reply]

This sounds remarkably like "quality doesn't matter, as long as we don't cause a traffic jam". I should prefer a demonstration that Flagged Revisions actually has an effect on vandalism justifying the trouble it would put us to. Septentrionalis PMAnderson 04:15, 27 January 2009 (UTC)[reply]

Even with a control group it is still not a blind experiment. With only a few hundred articles they can be tampered with, intentionally or not. A small group of pro-flagging people could easily make it run perfectly. --Apoc2400 (talk) 11:19, 27 January 2009 (UTC)[reply]

The obvious control group is the articles that are flagged - just review their edits before and after they are flagged. That's the advantage of maintaining edit histories! And if we limit the reviewer pool, we could trial any sample size. Fritzpoll (talk) 11:26, 27 January 2009 (UTC)[reply]

Unless, as some suggest, we flag them only after a thorough verification. That may be the only advantage of flagging; the way to check that is to verify twice the number of articles tested, and then flag half. Otherwise Fritzpoll's suggestion works. It is better to have an imperfect control group than none. Septentrionalis PMAnderson 16:57, 27 January 2009 (UTC)[reply]

He's proposed half of all BLPs as a control group. I think this dramatically reduces the likelihood of tampering, as he suggests. We would study a random subset of those afterwords. It would demonstrate both our capabilities and the possible gains. Cool Hand Luke 22:42, 27 January 2009 (UTC)[reply]

Note: I suggest holding an other poll after deciding which specific trial is best. The last poll can hardly be claimed to show consensus for this suggested trial. --Apoc2400 (talk) 01:01, 27 January 2009 (UTC)[reply]

Support as a trial - with all these trials, I think we also need to decide how to evaluate their success and when. DO we switch FlaggedRevs off, then analyse and discuss? Do we analyse in-flight and then have a poll after turning it off? Do we have a poll before the end of the trial to see if there is support for keeping it on? (very specifically for "keeping it on" - the default has to be the status quo) Fritzpoll (talk) 09:03, 27 January 2009 (UTC)[reply]

~~Support~~ if we limit the trial to a subset of blps, as trials must be limited in size as voted in the poll, and we won't have the resources to handle a mass of articles initially. The expiration time should also be !voted on. But it would be nice to be able to have several choices: 1 day, 3 days, 1 week for example, so that we can use 3 days for blps, 1 days for others, 1 week when needed. Clearly, we don't want this.
Technical implementation: it uses flag protection, so edits by autoconfirmed users are autoreviewed, we should be able to disable autoreview for non-reviewers on a particular page, for a more classic Flaggedrevs, so we need an implementation handling both, this is what I proposed (I was actually going to suggest a trial similar to this one) at Wikipedia_talk:Flagged_protection#Possible compromise proposal
Terminology: why not use reviewer instead of sighter ? Also, the edit this page link won't be replaced. Cenarium (Talk) 12:17, 27 January 2009 (UTC)[reply]
- That would be WP:USEENGLISH. Endorse, although FA may object. Septentrionalis PMAnderson 16:57, 27 January 2009 (UTC)[reply]
- Change to oppose as I don't support enabling FLRs on blps preemptively. Cenarium (Talk) 19:27, 31 January 2009 (UTC)[reply]
Oppose anything including the concept that we should trust some users more than others on content. Thehalfone (talk) 16:12, 27 January 2009 (UTC)[reply]
I would be fine with this trial personally but to get wider support it probably would be necessary to have the trial on a smaller number of articles. I also like the draft of a trial produced here - User:Mr.Z-man/yet another FlaggedRevs proposal. One other possibility would be to have a trial on a smaller number of articles for the first month followed by an expansion for the second month. The first month would let people get used to flagged revisions, see what to do and work out any bugs. The second month would let us see how well things work over a wider number of articles. Davewild (talk) 18:33, 27 January 2009 (UTC)[reply]

Might I suggest an initial sample comprising the top 1000 articles by quantity of vandalism over the past 3 months, excluding those articles of "Top" importance as evaluated by appropriate WikiProjects. The reason for this latter exclusion is Scott MacDonald's argument that vandalism on articles such as George W. Bush will be quickly reverted and disbelieved? I also suggest excluding semi-protected articles from the BLP portion of this trial, as they will be covered by the Flagged Protection aspect. If we can come up with a quantifiable means of assessing the state of the backlog, the trial can be extended to a more significant percentage of the articles as the trial proceeds. Thoughts? Fritzpoll (talk) 09:25, 27 January 2009 (UTC)[reply]

Just wondering: why should BLPs have a more stringent level of flagging than semi-protection? For a government, it doesn't make any sense to have more "Top Secret" documents than those with just a "Confidential" marking. Admiral Norton ^(talk) 13:22, 28 January 2009 (UTC)[reply]

Always an interesting question, worth an answer. A good place to start are some of the essays in Category:User essays on BLP, but probably best summarise by one wikilink: WP:BLP - read the Rationale section and you'll see. We have an ethical obligation to protect the subjects of our articles from harmful misleading statements. Fritzpoll (talk) 14:55, 28 January 2009 (UTC)[reply]

Total and unquestionable oppose. Let's not jump the gun; with all due respect, this is probably the best flagging supporters are ever going to get and I don't feel like setting any compromises of this liking. Admiral Norton ^(talk) 13:18, 28 January 2009 (UTC)[reply]
Would support for flagged revisions a thousand of highly vandalized BLPs. Nicolas1981 (talk) 14:45, 28 January 2009 (UTC)[reply]
Can I have a few more eyes at User:Fritzpoll/BLPFlaggedRevs, which is this proposal with a number of modifications to configuration and trial based on comments here. Feel free to edit/comment on the talk page Fritzpoll (talk)
Support But as a test only, there must be another full discussion with poll after the trial to assess its successfulness. Harland1 (^t/_c) 19:40, 31 January 2009 (UTC)[reply]

Split

I'd like to work on this trial in as much detail as Flagged Protection has been worked on, possibly incorporating some aspects of MrZMan's proposal. Wikipedia:Flagged permissions seems an appropriate title (expansive in the realm of permissions to replace protection, more restrictive in the case of BLPs). With agreement from people here, I'd like to start work on this page. Thoughts? Fritzpoll (talk) 08:29, 28 January 2009 (UTC)[reply]

Looks good. I don't know about thee name though. "Flagged permissions" doesn't really say me anything. --Apoc2400 (talk) 11:03, 28 January 2009 (UTC)[reply]

I'm trying to get a short name that encapsulates the idea of Flagged protection and BLP without calling it WP:Flagged protection and BLP! To me, Flagged permissions was about changing the permissions of two sets of articles. Clearly only in my mind though. Any suggestions? Fritzpoll (talk) 11:35, 28 January 2009 (UTC)[reply]

Begun work on a proposal based on these ideas in my userspace here - when we have a suitable project-space name, I'll move it there Fritzpoll (talk) 13:03, 28 January 2009 (UTC)[reply]

Trial 14: Flagged Protection and a sample of BLPs

Based on the above Trial, but with certain other restrictions. Fritzpoll (talk) 08:12, 2 February 2009 (UTC)[reply]

Trial 15: Cite it or lose it on BLP subrange

Trial 16: BLP Protection, User Vandalism Protection

Trial 17: Variant of flag protection

Trial 18: Shadow flagging

The trials as a group

Can we have more information on each of these proposed trials please? Things like how many articles will be involved, who will be able to flag them, who will be able to sight them, how long the trial will last, how on earth it will be possible to know whether the trial has been successful or not? As far as I know, the only trial that is clearly defined in this way is [WP: Flagged Protection] without such information, support for a particular trial title is meaningless Riversider2008 (talk) 15:53, 26 January 2009 (UTC)[reply]

Much more information on each trial has been posted subsequent to this request, which is therefore no longer relevant Riversider (talk) 22:37, 27 January 2009 (UTC)[reply]

Yes, could we move the "trials" that aren't actually specific trials but general comments somewhere else? --Apoc2400 (talk) 17:30, 26 January 2009 (UTC)[reply]

Commentary "Trial" sections moved from project page

The following 5 sections (5,6,7,11,12) were originally added to the project page instead of the talk page, and have been moved here since they seem to be more discussion than proposals for a specific kind of trial. `R. S. Shaw (talk) 01:23, 27 January 2009 (UTC)[reply]

Trial 5: New pages

A trial needs to establish:

How Flagged Revisions affects the number and first version compliance of new pages created by New Users who are registered but not autoconfirmed, in comparison to not using Flagged Revisions.
How Flagged Revisions affects the number and first version compliance of new pages created by registered users who are autoconfirmed but do not yet have Reviewer status, in comparison to not using Flagged Revisions.
The first version compliance of 'self-sighted' new pages created by users with Reviewer status, in comparison to not using Flagged Revisions.

Trial 6: Special:UnwatchedPages

A number of pages on the project are unwatched, (See Special:UnwatchedPages), a number of which would not pass a review under Flagged Revisions. A trial needs to establish how edits to unwatched pages are handled under particular configurations of Flagged Revisions.

Trial 7: Low-watched pages

An assessment needs to be made of how many 'little watched' pages Wikipedia has (i.e. less then 5 users, less than 10 users etc have Page XYZ on their watchlist). A trial needs to establish if there is any basis in the idea that only unreviewed edits to these little watched pages would need to be placed in an pool of pages with unsighted revisions, stopping that pool being unneccesarily cluttered with highly watched pages that will be more likely to be reviewed by appearing on a reviewers watchlist anyway. MickMacNee (talk) 19:46, 3 January 2009 (UTC)[reply]

I guess most pages have less than 2 watchers. But I might be wrong, figures would be nice. Nicolas1981 (talk) 09:01, 28 January 2009 (UTC)[reply]

Trial 11: Reviewer status

Any trial should establish by analysis of a portion or all instances of a Reviewer having his reviewer status removed during that trial, whether or not:

the removal of Reviewer status was used in instances where a block would have normaly been issued to the Reviewer instead
the removal of Reviewer status was used in instances where a page would have been protected instead

I feel like these should go under either metrics or restrictions? Lot 49a^talk 08:06, 8 January 2009 (UTC)[reply]

Trial 12: BLP effects

Any trial focussing on / including BLP articles should make a comparative analysis on both flagged and unflagged biographies of the following incidents being taken regarding en.wikipedia articles:

Any material analysis of the above should identify whether the actors involved posessed or declared having access to a registered Wikipedia user account able to view unsighted revisions. MickMacNee (talk) 19:08, 8 January 2009 (UTC)[reply]

Conditions

Requested Changes

Rule/condition #2 - Any WikiProject may take a poll to add specific content to the trial. -- Mjquin_id (talk) 23:04, 26 January 2009 (UTC)[reply]

Possible metrics to measure success of trials

The following is a list of possible metrics with which to judge the results of trials. Editors are encouraged to add to this list if they think of a new metric. Ozob (talk) 23:23, 4 January 2009 (UTC)[reply]

(see Wikipedia:Flagged revisions/Trial/Proposed trials#Possible metrics to measure success of trials)

I am debating making this a WikiTable with the following columns: "Number", "Name", "How Measured", "What it should tell us", "notes" - Probably split off the page? Other ideas? -- Mjquin_id (talk) 17:00, 11 January 2009 (UTC)[reply]

I also think a number of these items are "data points" that are needed for higher metrics; should that be identified? -- Mjquin_id (talk) 17:00, 11 January 2009 (UTC)[reply]

It would be good to get these more organized and more detailed. I hope it's clear what I intended each metric to capture? If not I can help you fill in the descriptions of the table. Oh, and go ahead and annotate these with as much information as you can ("data points" etc.) because that will make it clearer to people reading it for the first time. (BTW, I'm not sure that a table is the right thing. It would get very very wide. But if you think it would work, go ahead and try it.) Ozob (talk) 01:43, 13 January 2009 (UTC)[reply]

This is stupid. Of course there will be less vandalism to pages with flagged revisions enabled. But there will also be less improvement to pages with flagged revisions enabled, and the average time taken to deal with vandalism across all pages will increase significantly because patrollers will have to spend most of their time sighting perfectly good revisions -- Gurch (talk) 21:47, 24 January 2009 (UTC)[reply]

^{[citation needed]}. I'm sorry, but it had to be said: the whole point of a trial period is to determine objectively whether that entirely unsupported axiomatic assumption is in any way justified. If you genuinely think it is "stupid" to want to base assertions on more than academic speculation, then you're in the wrong place. I think (hope, at least) we're all quite prepared to be proved wrong on our personal opinion of FlaggedRevs. But few of us will be swayed at this juncture by unsupported statements. Happy‑melon 22:01, 24 January 2009 (UTC)[reply]

The point is not that people may or may not be proved wrong, it's that people will be proved right and then blindly conclude from that that flagged revisions is a good idea. Sooner or later some sort of trial will happen, people will look at the editing history and say "yes, there was less vandalism while it was flagged". It doesn't matter what sort of analysis they use to come to that conclusion or whether they previously believed it to be true or not, it will obviously be true. That doesn't mean it's pointless to measure it to figure out precisely how much mess vandalism it results in. But you're measuring completely the wrong thing, and your conclusions will therefore be fallacious. Think about it. If you decided to have a trial where you fully protected every page, and then measured the amount of vandalism before and after, you'd also see a drop in vandalism (a much bigger one, too). But it would be dumb to conclude from that that full protection of every page is a good idea. You simply can't measure the effect of people being put off from contributing in any quantifiable way, so I can hardly blame anyone for not proposing to measure that. But it is possible to measure the amount of extra work for patrollers that would be generated out of thin air. Indeed, it is possible to measure it without a trial, just by looking at recent changes and seeing how many edits would have to be sighted were flagged revisions enabled. What it isn't possible to measure is the effect that this extra work has on other things. A trial does not help with this, because the amount of work will be only trial-sized too, and so not affect the rest of the project. And this is the most dangerous part of the current attitude towards flagged revisions -- that there should be a trial, and if it goes well, it should be enabled for the whole project. Suppose a trial is run on a sample of 1 in every 1000 articles. Someone will compare vandalism on those articles to vandalism elsewhere and find lower levels. All kinds of other analysis will probably be run too. None of this analysis, however, will reveal the huge amount of extra work, because with a 1 in 1000 trial, it won't be that huge. Nevertheless, when flagged revisions is then enabled across the whole project, one of two things will happen: a huge backlog of unsighted revisions builds up, and Wikipedia stagnates because most of the pages are never updated, despite regular editing; or people start flagging revisions without even looking at them in an effort to clear the backlog, defeating the whole point of the system and thus putting us back where we were except with a massive increase in the amount of extra resources tied up in something pointless -- Gurch (talk) 22:57, 24 January 2009 (UTC)[reply]

Your argument functions on the premise that this will be enabled over the whole project, which is not the intention of the majority of editors (not even Jimmy!). Perhaps the very valid point you are making is that the metrics need to be trial-dependent? For instance, if we ran with Flagged Protection as a trial, we can measure certain quantities that would be different to running a half-wiki trial? I agree that this section appears (not saying it is trying to) to be imposing metrics on all trials, when, as Gurch's analysis makes clear, this woul eb inappropriate. Fair comment, Gurch? Fritzpoll (talk) 23:02, 24 January 2009 (UTC)[reply]

Sort of. Perhaps it's better to explain it like this: they would indeed need to be "trial-dependent", if by that you mean that if you were conducting a trial on a scale one thousandth that of the actual proposal, to get accurate results about time taken you'd have to prohibit all but one-thousandth of contributors from participating in the proposal. But that's not realistically going to happen. I'll give a numerical example, obviously these numbers are just made up but if you take the best approximations we have of the real figures I don't doubt that the result will be the same.

Let's assume that if flagged revisions is implemented, 100 people are willing to invest a significant amount of time sighting revisions on a regular basis. That's more than the number of regular, highly active RC patrollers we currently have, so I'm being generous here. Let's assume that each of them can keep up with the changes to 1000 articles. Of course that's nothing if all you're doing is watching for vandalism -- 2-3 people can do the entire project on a quiet day. But this will involve looking at all edits, not just possible vandalism, and will also involve reading through each edit in full to make sure it doesn't contain copyright violations or libel, not just the split-second glance to tell if it's vandalism or not. With that in mind, I think 1000 articles per patroller is an optimistic estimate. Now let's suppose a trial goes ahead of one-thousandth of Wikipedia's content, or about 2500 pages (about the scale I'm hearing proposed most frequently). Even if only 3 of the active patrollers decide to participate in the trial, they'll have no trouble at all sighting everything that needs to be sighted between them. So at the end of the trial, the mean time to sighting or whatever will be worked out and found to be pretty low. Now assume it is deployed on a much wider scale. It doesn't have to be all articles. Even the ~330,000 biographies of living persons would be well beyond the capacity of our hypothetical 100 patrollers. At least another 200 contributors would have to take up patrolling as actively as the existing patrollers to compensate. If this happens, we lose out in whatever areas those contributors were previously spending their time, and if it doesn't, the number of sighted revisions doesn't keep pace with the number of revisions that need sighting and, given enough time, the backlog grows enormous. If we did decide to use flagged revisions on all 2.7 million articles, the problem would be an order of magnitude worse -- Gurch (talk) 23:18, 24 January 2009 (UTC)[reply]

I think we'll be ok avoiding the problem of using the methodology over all articles. I'm pretty certain that has always been pragmatically off the table. For some trials, the sample size should be sufficient. For example, there are around 2-3000 semi-protected pages. In your scenario, the trial of FlaggedRevs in the form of Flagged Protection would be acceptable, as the sample size represents the final implementation size. When our "subset size" becomes significantly larger than our trial, then we may indeed have a problem. I think one way might be to look at the scale of the backlog accrued in such trials. Thus, say we chose 1/X of articles in a particular subgroup to flag for a trial. If we get a backlog of Y over some predetermined time, then a crude measure would be Y * X of the estimated backlog on full implementation. This figure would be liberal (a good thing for this kind of comparison since we want to be cautious), but would be dependent on picking a sufficiently high sample of the decided subset, otherwise the task might be trivial for our 100 reviewers. If the estimated backlog proves too large following a trial, that can be a metric for dismissing the implementation. Not sure the maths is appropriate here (too simplistic and I'd need to put my stats hat on) but how does the principle strike you? Fritzpoll (talk) 23:45, 24 January 2009 (UTC)[reply]

Indeed there is a subtle error in your maths there, one that I think is probably present in most people's concept of the flagging process. I'll illustrate it with a really, really simplistic example, which is a gross oversimplification of the whole process, but explains why your Y * X figure is not a good estimate:

Suppose there is just 1 editor who can sight 1000 revisions a day, and also does nothing but sit waiting for revisions to sight. Suppose that there are 2010 revisions made per day in total, and that half of them are to be sighted for a trial. That means 1005 revisions to be sighted each day, so each day the backlog of unsighted revisions increases by 5. After say a month, it's 150 revisions. Not that bad. Now by your reasoning, we would expect a full deployment to produce a backlog of 150 * 2 = 300 revisions after a month. But it doesn't! Double the number of revisions to 2010 and now the backlog increases by 1010 revisions each day. After a month, it has grown to 30300. Quite a lot more than the 300 that you would have predicted. :) Now obviously I've picked these numbers for effect, but I foresee a similar thing happening in practise -- Gurch (talk) 00:14, 25 January 2009 (UTC)[reply]

Yes, my Y value was broadly construed. I think any backlog at a low level would be indicative of a problem in expanding scope. Fritzpoll (talk) 00:34, 25 January 2009 (UTC)[reply]

(od) Couple of points for Gurch:

You give a good argument here that, for a limited initial trial, we should artificially limit the number of reviewers (e.g. by handing it out manually as per the proposed trial config). This is obviously possible, e.g. if we do a first test on 1% of BLPs, we can give out 1% of the estimated final number of reviewerships.
I'm puzzled by your claim that the FlaggedRevs would be more labour intensive than the current RCP. Presumably patrollers at present look at the diff for every edit they check. I've not used Huggle, but I assume that they then either press a button which says "next" if they think the edit is OK, or a button which says "revert and go to the next". Under flagged revisions, a similar piece of software would have a button saying "flag as good and go to next" and a button saying "revert and go to next". From the user's pov, either way it should be one click with appropriate software. Flagged revs has the additional advantage that "next" can avoid any pages already flagged, i.e. there is no danger of different patrollers reviewing the same good edit; it also allows you automatically to check the diff with the last flagged revision, avoiding the problem of reverting only the latest in a sequence of vandal edits (in fact this is automatic for editing with the standard interface). Of course, right now we miss some fraction of vandalism by anons & new users, whereas with FlaggedRevs all these edits have to be looked at. But I think only a minority of vandalism is missed, so I don't think the number of diffs to check would increase by a large factor.

PaddyLeahy (talk) 00:58, 25 January 2009 (UTC)[reply]

At the moment, patrollers generally (except when checking new pages) only have to decide if something is vandalism or not, and in many cases a glance at the diff is all that is needed. With flagged revisions, at the very least Huggle would need to provide three options -- revert, sight or ignore. And it's deciding between those last two that would take the time. You can imagine the drama someone would get themselves into if they flagged a revision containing libel, copyright violations or personal information and subsequently that revision had to be deleted or oversighted. Anyone who starts sighting everything that isn't obviously vandalism will quickly fall foul of this, so patrollers end up with two options: revert all vandalism and ignore everything else, or spend significantly longer looking at stuff in order that they only sight stuff that's definitely good.

In short, the big change is that people are now required to affirm that a revision is good, rather than just failing to affirm that it is bad, something that is much more than just a re-labelling of buttons -- Gurch (talk) 01:43, 25 January 2009 (UTC)[reply]

This is a fair point, but not insoluble. In the German implementation "sighting" simply indicates lack of obvious vandalism and is emphatically not any positive indicator of quality or even lack of legally actionable material. This can be stated loudly and clearly on the pages describing the flagging system, which are widely linked to. On the other hand, the idea of flagging specifically for BLPs seems to assume a check at least a bit more careful, for instance at a minimum, edits containing unsourced negative statements should be reverted. Since only 11% of our articles are BLPs, it should be possible to spend a bit more time checking edits to them. One approach based on Huggle, for instance, would be to divide RCP into two streams, one as now except avoiding category BLP, and one that only looked at BLPs and was sensitive to flags as I suggested above; maybe also with an ignore button (but FlaggedRevs allows us to do better than that: it could mark the revision as "questionable" (say), to show the page has been glanced at and requires further checking, e.g. by looking up an alleged source). What is clearly needed are carefully-defined rules for flagging which

Can reasonably be followed by volunteers without too much effort.
Do not involve the flagger taking responsibility for article accuracy (otherwise only libel lawyers should be reviewing BLP changes).

Legal advice from WMF lawyers would be useful here! PaddyLeahy (talk) 17:58, 25 January 2009 (UTC)[reply]

Well that would be a start, but at the moment I don't think that's what people have in mind. Jimbo is constantly going on about how [insert latest bit of libel here] wouldn't have happened if only we'd had flagged revisions. If the system is only intended to keep out vandalism and not libel / copyright violations / personal information, then it doesn't actually help with that at all -- Gurch (talk) 20:03, 25 January 2009 (UTC)[reply]

Also, your suggestion would require an efficient way to determine which articles were BLPs and which weren't. Right now I'm not aware of any. Generating a list of them is out of the question, because there are so many. Even loading a pre-generated list at startup would be unworkable because of the size of such a list. So somehow the BLPness or otherwise of an article has to be determined in real time. The easiest way to do this is to ask the API if the article is in Category:Living people. But doing this on every edit, even if you exclude bots and other trusted users, is too inefficient, and the developers would probably kill me -- Gurch (talk) 20:07, 25 January 2009 (UTC)[reply]

These metrics all gather Quantitative information, which will no doubt be interpreted in various ways depending on the Point of View of the person using the information. They will be pretty useless in determining the success or otherwise of the actual trials. There's a big need for QUALITATIVE information. It's how the editing process feels that determines whether editors stay and continue to edit, or decide to take their knowledge and talents elsewhere. Riversider (talk) 11:32, 17 February 2009 (UTC)[reply]

Do you have any suggestions on how such information could be gathered? Ozob (talk) 22:11, 17 February 2009 (UTC)[reply]

General revisions

Any useful test will be made against a comparable control sample, so that we can tell whether FR are working better or worse than the system we use now; any practical test will limit itself to a number of articles we can conceivably examine when it is over, so that we can actually see what the effects have been, and fix them if necessary. Trial 3 at least approaches this; but a hundred articles would be better. Large samples, beyond the minimum needed for statistical significance, do not improve trials; see the implosion of Literary Digest. Septentrionalis PMAnderson 20:35, 3 January 2009 (UTC)[reply]

A control group is an absolute necessity, which means that we should include at most half a particular set of articles in a trial. That means the FA trial covers about 1,200 articles and the semi-protected trial would cover about 1,700 articles. Of course, there's no requirement that the whole set be included in a trial; I'd personally prefer to see trials of about 500 articles at a time (that is, 500 in the 'active' group and 500 in the control), randomly assigned. Happy‑melon 20:44, 3 January 2009 (UTC)[reply]

Is it possible to have an article "listed" as protected, but not "actually" protected to see how many "more" edits it receives? -- Mjquin_id (talk) 16:46, 11 January 2009 (UTC)[reply]

Trial Time Frame

There is no need for a trial time of two months, unless we expect the median sighting time during the trial to be the three weeks the German Wikipedia is struggling to maintain - I would consider that to be a failure anyway. For articles involving current events in any way, it is laughable. Try 24 or 72 hours. Septentrionalis PMAnderson 20:35, 3 January 2009 (UTC)[reply]

You expect editors to get the hang of FlaggedRevisions within a day? How many revisions, never mind sightings, do you expect the articles to notch up within that timeframe? I was expecting people to be more concerned that two months was too short :D Happy‑melon 20:44, 3 January 2009 (UTC)[reply]

If they don't get the hang of it in a day, it's too complicated to become routine, when we will expect fairly new editors to pick it up as quickly as they pick up page moves.

How many depends on the article. One fairly prominent article has had a banker's dozen (including the successful sightings). If median sighting times are much over three days, the test has failed. Septentrionalis PMAnderson

While I agree that we're looking for sighting times of IMO no more than 48 hours, that doesn't escape the fact that in a 3-day trial only the first tranche of edits will actually make it through the full process. Such a tiny trial window leaves it wide open to random fluctuations in all manner of variables: wikipedia edit patterns tend to roll on weekly and monthly cycles, and are heavily affected by school and national holidays around the world. To have any chance of smoothing these fluctuations out, we'd need on the order of months; five or siz weeks at an absolute minimum.

Again I agree with you that if they don't get the hang of it in a day, it's disruptively complicated. What about the people who are only editing for one day out of those three? Some users can only edit on weekdays, others only on weekends; you can't cater for both wich such a short trial. Happy‑melon 22:00, 3 January 2009 (UTC)[reply]

And after three days, we can discuss seven. It's only a three day wait. Septentrionalis PMAnderson 22:35, 3 January 2009 (UTC)[reply]

Lol. I certainly take your point that short trials should preceed longer ones. I maintain that no usable data would be recovered from a trial shorter than a few weeks, and that for 'good' data we're looking at a few months. Perhaps four weeks for an 'initial' trial, followed by a longer (3 months?) one if the results look promising (by which I mean the trial is likely to recover any usable data, not just that positive data is being received). Thoughts? Happy‑melon 17:15, 4 January 2009 (UTC)[reply]

I don't see why a few weeks should be necessary. What mean time between sightings do you expect?

On the other hand, I foresee at least a possibility that we will want to call off the first trial almost immediately because sighting is working incredibly badly. One test on a set of fairly active articles would answer this. Septentrionalis PMAnderson 17:33, 4 January 2009 (UTC)[reply]

I consider a median time of about two days to be a safely conservative estimate. One reason for a longer trial is to collect a statistically-significant number of edits without having to include an unmanageably large number of articles in the trial. More importantly, though, is to avoid random error. This could be caused by external factors (LP in the trial has a high-profile legal incident? Expect an unnatural edit spike); while this would be a good test of FLR and I'd thank the gods of chance if it happened, it is not a natural situation for that unnatural level of activity to be present throughout the trial period. A one-week spike during a three-month trial is a much better model of how such an event would affect the encyclopedia. Secondly, there is the certainty that the editing patterns will be affected, at least initially, by the implementation of FLR. No matter how easily people adapt (and analysis of how well they adapt is an important thing to derive from such trials) they will take time to do so; new habits have to emerge, permissions must be granted, there is the interface itself to get the hang of. Such factors will affect editing patterns in the first week of a trial, possibly in the second; if there is still disruption after three weeks then there are questions to answer. But realistically data gathered in those first two weeks is useful only really to analyse how the transition is handled, an article will not, IMO, begin to behave exactly as it would if it had been sighted since 2001, until that time has passed. Not allowing that time to pass and for editors to become acquainted with the new features would give a misleading impression of the amount of 'chaos' associated with an implementation. Some disruption is certainly inevitable; and it was chaos on de.wiki because of the abrupt and poorly-organised rollout, but there is an important distinction between issues raised by the existence of FlaggedRevs on articles, and issues raised by the implementation of FlaggedRevs on articles; capturing and quantifying that distinction will be a key aim of a successful trial proposal. Happy‑melon 20:14, 4 January 2009 (UTC)[reply]

One of the problems with all these recently proposed tests are that they are giving too short test times. Flagged Reversions is a new toy for most English Wikipedians, if this passes. Like all new toys, people will be interested in trying it out. But for how long? Perhaps some will be interested for a week, maybe even slightly longer. I doubt that most people will continuously dedicate themselves to going through the backlogs for weeks and months. At least two months of a FR trial have to be used as the minimum, otherwise we will not be able to get an accurate assessment of how many people will be willing to go through the backlog on a regular basis. NuclearWarfare ^{contact me}_{My work} 23:21, 4 January 2009 (UTC)[reply]
- OK, that's a reason for a three month trial. Thinking about it in those terms, I expect, however, that even the initial burst of enthusiasm will not be enough, and therefore propose short trials first. If I'm right, and we never get enough active sighters, we can stop there. Septentrionalis PMAnderson 18:47, 5 January 2009 (UTC)[reply]
  - I believe we will never get enough sighters too, which is why I'm proposing a longer trial at first. A long initial trial that shows people how awful FR will get might be a good idea. NuclearWarfare ^{contact me}_{My work} 20:13, 5 January 2009 (UTC)[reply]
    - I see the point; but I don't want to have to clean up the mess. Gradually increasing times will get us to the point of breakdown without doing too much harm to Wikipedia in the process. Septentrionalis PMAnderson 21:06, 5 January 2009 (UTC)[reply]

(outdent) Approve a 90-day timeframe should give us solid metrics. The time for the trial should be based on the list of initial reviewers (based on criteria in that section) and how often they login. Some experienced and valuable editors only get on every weekend. Any test that did not cover at least 2 or 3 weekends would exclude everyone but the daily people. I think we need a period of time that covers a period where at least 3-5 'incidents' would normally occur. Perhaps three times the mean period between edits (vandalism) on FA articles? -- Mjquin_id (talk) 16:46, 11 January 2009 (UTC)[reply]

It took three weeks just to get the poll done, and there are still people complaining that they didn't get a chance to vote! A trial of a few days would tell us nothing, and would just nescessitate a longer trial (which would have to be discussed, voted on, etc... as well). If we're going to do this, we might as well do it right. I still think it's a bad idea to begin with, but I realise that the 400 who voted in favour weren't calling for a 48 hour sample, they were voting for a legitimate test. Anything less than a month really wouldn't qualify. LSD (talk) 18:42, 28 January 2009 (UTC)[reply]

The median and mean 'review time' is NOT the same as the median and mean 'pending lag' time. You can get stats here. Others have compiled these. The median time for anons waiting on their edits is 1 hour, with an average of 33 (some pages do take forever). Aaron Schulz 04:26, 7 March 2009 (UTC)[reply]

Comments on Criteria for Success

The points raised about criteria for the ALL WIKIPEDIA trial (trial 8) apply to all of the trials. We need to know what the criteria are for determining the success no matter what trial is undertaken.

Some general points:

It seems like the main questions that we should be seeking to answer in the trial are:

How well does the flagged revisions system reduce the number of live bad edits compared to unflagged articles (how much better)?
What are the relative costs in terms of absolute hours for maintaining flagged and unflagged articles (how long does it take)?
What are the relative costs in terms of person hours for maintaining flagged and unflagged articles (how much effort)?

I feel like the current list of criteria doesn't yet succeed in capturing what we ought to care about.

The problem that flagged edits is trying to solve involves reducing the amount of time that bad edits spend 'live' on the site, we need a way of tracking the median time that unflagged articles spend with bad edits visible to the public (in the flagged articles, that # should be zero).

Easily done for straight vandalism (compare the timestamps between edits and their undoing and compare that to total time elapsed during the trial) but harder for edits that get fixed when a second editor takes material added and NPOVs it or whatever. Not sure how to signal to the test what counts as a bad edit in these cases.

For the second two questions, we need some way of determining whether flagged or unflagged articles improve more quickly. If it takes 2 weeks for edits to go live in flagged articles but at the end of 2 weeks you have a better quality article than you'd have had with 2 weeks of the current editing scheme, then we should switch over to flagging. Likewise, if it takes 300 person-hours of editing to get an article to a certain level of quality in the flagged scheme but 150 hours to get it to that level of quality in the unflagged scheme, we might want to keep the status quo.

We need a way to figure out how many extra person-hours of work monitoring flagged articles consumes as well as how many absolute hours it takes for improvements to go live.

How to determine if article quality has improved? We could take groups of articles that transitioned from, say, "stub" to a regular article or from "good" to "feature" over the course of the test, and use this data to begin to guess at whether flagged status sped up or slowed down the improvement process in absolute time.

I am less certain about how we could test for person-hours invested. Maybe number of transactions (edits and reviews)?

By comparing all of these things, it seems to me that we'd get a better picture of whether flagging articles was solving the problem it was meant to solve and quantifying how much (if any) cost there was in terms of both timeliness and effort.

Lot 49a^talk 06:56, 4 January 2009 (UTC)[reply]

The time bad edits spend live on the site under flagged revisions will certainly not be zero, although it may be small. There are other bad edits that obvious vandalism; for example, a trusted edit makes an edit which is an unsourced and negative assertion on a living person and sights it because the editor believes it true (those who think this can't happen should review the edit history of Sarah Palin); an anon catches it and edits to remove it. The fix must wait until someone sights the article. Septentrionalis PMAnderson 16:45, 4 January 2009 (UTC)[reply]

This is a good point. We'll definitely want the test to be able to detect how long bad edits stay live under the flagged revisions system. The challenge is that it's hard to tell when a non-reverted edit is a 'bad' edit vs an edit that was OK but benefited from future further improvement. This challenge happens for both kinds of test. Lot 49a^talk 23:09, 4 January 2009 (UTC)[reply]

For some edits, this is something that can only be judged by a human. Computers can't detect POV, and not every edit fixing POV issues will necessarily contain "POV" or "NPOV" in the summary.

Fortunately, these are not the sorts of issues that FR is intended to address. FR is supposed to be for obvious vandalism, like when someone takes your favorite article and replaces it with "Richard is a GIANT PENIS." Drive-by penis vandalism is usually reverted by rollback, an anti-vandalism tool, or by an editor who puts "rvv" or "rv vandalism" or something similar in the summary. So it should be easy to measure whether FR has the intended effect or not. That's the purpose of a lot of the metrics I listed above. (I've added some based on your comments above.) These should contain enough data to tell us whether FR has the intended effect.

Of course, FR could have unintended effects. I'm hoping to measure the effect on productivity by the number of edits made by various kinds of users. Tracking the number of editors of various types should measure whether FRs discourage IP editing and encourage registration. Measuring the number of protected and semi-protected pages should tell us whether it's a good replacement for semi-protection or protection and whether it calms people down during an edit war. And there's a bunch of measures of how large the backlog is. If you can think of more metrics, please add them! It's better to have all of these written down before any trial happens so we know we can collect the necessary data. Or so that we know we're not collecting the necessary data. Ozob (talk) 23:38, 4 January 2009 (UTC)[reply]

Comment on reviewer privilege

Proposition 1 states that reviewer privilege will be asked by the user and granted by an administrator. I believe this is a hassle. I think all long-term contributors (more than X edits and active since Y months) should be granted the privilege automatically. Otherwise Flagged Revisions will only be known by technocrats, and sighting will be undermanned. Any user could revoke his own privilege via the settings page. Who do you think should be granted the reviewer privilege ? Nicolas1981 (talk) 07:15, 7 January 2009 (UTC)[reply]

I wonder if there is a bot (or report) that could show all people with "real" accounts, more than a year old (or more than 1000? edits), belonging to at least one WikiProject, and have never had a rvv? Would that be a good start on a list of reviewers? Perhaps more stringent. I dislike the "expert" concept, but "reputable" should be a solid requirement; though difficult to objectify. There do need to be at least 3-5 per WikiProject initially. -- Mjquin_id (talk) 16:20, 11 January 2009 (UTC)[reply]

Yes, we need a criterion to tell who can review. I don't think belonging to a WikiProject should be a requirement, though. Nicolas1981 (talk) 07:00, 13 January 2009 (UTC)[reply]

I'll speak up for the large body of casual but responsible editors. I think your criteria are way too strict. Myself, I've been on WP 3-4 years with 250 edits, and I even have a revert or two (from inadequate summaries on removing info). I care about WP, but don't have the time to be on here every night. I think something more along the lines of an account 1 week old and 20 edits would be enough. Someone like me might only come in and sight a dozen articles once a week, but that would still reduce the burden on the more dedicated editors. --Walt (talk) 19:53, 16 January 2009 (UTC)[reply]

I agree with Walt. Checking an article for obvious vandalism is an easy task, someone with 1 week and 20 edits can can do it. Nicolas1981 (talk) 16:08, 17 January 2009 (UTC)[reply]

Discussed ad nauseam on WT:Flagged revisions/Sighted versions/Archive 4. The aim of a restriction is to force determined vandals to do something useful before they get the chance to sight their own rubbish (and get blocked). I think it needs a bit more of a record than Wroscel suggests but should certainly include editors like him.PaddyLeahy (talk) 18:37, 17 January 2009 (UTC)[reply]

I've always expected that it would be an "if you know enough to ask, you know enough to get it" situation. Requiring a proactive application process has advantages beyond just allowing human review of the quality of the editor. Think about the situation we currently have - we have a huge number of anonymous constructive editors, probably more than we have active registered users, but they are largely transitory. People come, people go, people make one or two edits and then drift away. We get a lot of useful edits out of these people, but (with a few exceptions of long-term IP editors that only prove the rule) they haven't got the 'bug' - they haven't committed to becoming 'part' of wikipedia. The extra features that an account provide are 'force multipliers' such as I keep harking on about, everything the interface provides registered users - watchlists, signatures, user rights, undo, permissions, the works - is designed to make people more productive editors. People are right to say that enabling FlaggedRevs will discourage some IP editors - fewer, I hope, than is claimed, but some nonetheless. But we are also creating more gradation on the road from 'casual IP editor' to 'fully fledged registered editor', that IMO can only be a good thing. An IP comes along, makes an edit, doesn't see it immediately. Option 1: be discouraged and go away - that's a loss to us, as everyone notes. But Option 2, which most opponents ignore, is to be encouraged to register, to make that first step when they might not otherwise have done so. That's a gain to us. And once they're there, they'll get the bug more strongly; they watch a favourite page, soon they're reviewing it every time they log on, etc etc - it's the same story that's been repeated ten thousand times. Now, the road from there to a 'regular' editor is still quite long and bumpy, and wikipedia is not getting the best from them if they never penetrate into the 'backstage'. I know we have innumerable editors who work almost entirely in the mainspace and wouldn't consider themselves 'backstage' workers, but they've still been there to read the policies, guidelines, etc. The more editors know about how we work and what we do, the less likely they are to drift away, or get bitten for good-faith mistakes. What better way to encourage newcomers to get involved than to say "ok, all we want you to do is read a few of these policy pages, then come and ask an admin (oh, here's what an admin is, btw) and you can have a shiny mark of trust and the associated warm glow". And bingo, you now have one more editor who A) knows WP:BLP, WP:V, WP:NPOV and WP:OR, B) knows what an admin is and where to find one if they need adminy help, and C) now feels a 'part' of the project in a way they didn't before. Win-win situation, IMO. So I think we should be offering it to anyone who is competent enough to ask for it, within reason of course. There should have to be a reason to not grant it, rather than a reason to do so. Happy‑melon 19:29, 17 January 2009 (UTC)[reply]

FR is not what I would choose to use as a way of persuading (rather than forceing) registration of transient good faith IPs. Greater uptake of accounts will probably occur (purely speculating), but this can hardly be considered a core benefit of FR, it's more like the advantage of not having to take your dog for a walk anymore if it is crippled in a collision with a car. MickMacNee (talk) 02:44, 27 January 2009 (UTC)[reply]

It may, however, turn into something along the lines of the current rollback situation. Administrators can become stingy in giving this ridiculous "privilage" out. Furthermore, sysop A may have a stricter lending policy than sysop B, which presents some issues that one must pay attention to; for example, newly-registered user A has 25 edits after one week of use and asks sysop B for the ability to sight reviews, for which they are declined because sysop B is highly conservative and does not trust new users (by nature, I'm not implying any sort of maliciousness here). What incentive does user A have to get the sighting ability now? This whole process is just one conundrum after the other. I feel it is going to turn out to become so incredibly complicated that the whole of Wikipedia will quite probably implode upon itself. NSR77 ^T 01:44, 25 January 2009 (UTC)[reply]

Having to ask permission is a huge barrier. When I see a protected page that I could improve, I usually just mumble a curse and move on, because writing up an {{editprotected}} request is too much trouble. I haven't asked for rollbacker permission. I probably won't ask for sighter permission. I strongly suggest that the permission should be granted automatically to editors in good standing (users with no recent vandalism warnings or blocks, and some reasonable number of edits and days since registering an account). It would also be a good idea to have a way to request permission if you don't meet the automated criteria. —AlanBarrett (talk) 18:49, 25 January 2009 (UTC)[reply]

Even if it is going to be given to only a subset of users, I don't understand why it's being proposed as a separate user group. Why can't the right just be added to the rollbacker user group? It's the same set of people either way -- Gurch (talk) 20:12, 25 January 2009 (UTC)[reply]

I like the idea of having some automatic message on my talk page saying: "Congratulations, you have been granted 'reviewer' privilege for: {...your 1000th edit (or edited page) or ... your one (1) year anniversary}. Our sincere thanks for being a WikiPedia Editor!" I would even like to track this for WikiProject members...to "automatically" thank them for a certain number of edits... (provided they have no vandal reverts, etc, etc...) -- Mjquin_id (talk) 00:00, 27 January 2009 (UTC)[reply]

As a half way house between a requirement to request reviewer status and automatic promotion, what about a semi-automated system where editors with a certain numbers of edits over a length of time would be automatically nominated. It would then work a bit like speedy deletion. Editors could place a "hang on" if they felt a editor shouldn't gain reviewer status. An admin would review the editors contribution and the objection and grant reviewer status unless he or she thought there was a good reason not to. — Blue-Haired Lawyer 11:16, 5 March 2009 (UTC)[reply]

Usability issues and evaluation

While I'm a supporter of testing FR, I think my biggest concern at this point - beyond backlogs, and philosophy, which I believe there are adequate mechanisms for dealing with - is usability. Most of our content comes from anonymous contributors with very little knowledge of how Wikipedia works - they know that if they click "edit this page," they see something that corresponds roughly to the page content, and then they can edit it and click save and it'll appear. With FR on, things are more complex: they click edit, and they see the wikitext of the draft revision, which isn't necessarily the same as the page they were intending to edit. They click save, and the page looks just like the original page.

To help deal with these issues, I have a proposition: let's remove the "edit this page" link from the sighted page and replace it with "View draft". Only the draft version will have an edit link. That way, whenever they click edit, they will be editing the page they're currently looking at, and they will see their changes immediately after they click save.

Also, I think there's been painfully little usability evaluation on Wikipedia; we can't anticipate what kind of problems novice users will have. Someone needs to do a real usability study: sit down with some real novice users and watch them using these systems, record what they do, and interview them about the experience. That's the only way to get real data here. Dcoetzee 23:34, 25 January 2009 (UTC)[reply]

Concur. One of the greatest problems with FR is that reviewers do not see the text the world sees. WYSIWYG would be better. Septentrionalis PMAnderson 00:49, 26 January 2009 (UTC)[reply]

Good idea. --Apoc2400 (talk) 13:25, 26 January 2009 (UTC)[reply]

I totally support this. While I actually can set the clock on my VCR (which I realize dates me...); it took quite a while to understand "where" to get information on making a "proper" edit...and then Huggle...and Winkle. Great, great tools, but serious usability issues. (I reverted my own vandal reversion at LEAST 10 times, before finally getting the process down). This process will have to have something like colored highlighting that corresponds to the editor or something... I am not seeing this in the test lab... -- Mjquin_id (talk) 00:20, 27 January 2009 (UTC)[reply]

Email

Will WikiProject people be able to place a list of pages somewhere such that if any of those pages has a pending revision (or more than X pending) that it send me (or a list of reviewers) an email?

Article xxxyyy has 99 revisions pending

Full example: WikiProject Bob sets up a list of pages that are under FR. Reviewers are also listed on a page. Reviewers might have a "preference" based "min rev number"; that could be set to say 10... Any page on the list that has more than 10 revisions pending would spam the list of reviewers. Ideally, the project could set a central "days-old" flag; also potentially, it would only email once per day...(or some time period parameter from "preferences") -- Mjquin_id (talk) 00:20, 27 January 2009 (UTC)[reply]

Cross reference

The sequence of edits on Kennedy's collapse can be followed, and I have analysed them at WT:Flagged revisions#the Kennedy test. There were seven edits inserting Kennedy's death; it appears, on reasonable assumptions who would sight edits, that three of them would have been sighted. Comments there, please. Septentrionalis PMAnderson 01:03, 26 January 2009 (UTC)[reply]

Yes, this and my argument at the bottom of #Possible metrics to measure success of trials show that the most we can expect flagged revisions to do is reduce the visibility of vandalism, yet people are still treating it as though it will catch all content issues when it clearly won't -- Gurch (talk) 08:52, 26 January 2009 (UTC)[reply]

Specifications

Happy melon's persistent refusal to give out specifications for FR has convinced me that either he doesn't know them, and his assurances are hot air; or that they become whatever is necessary to assure editors that "nothing can possibly go wrong". On these grounds, I will oppose any further test until specifications are published. Septentrionalis PMAnderson 18:54, 26 January 2009 (UTC)[reply]

Pmanderson's persistent refusal to try seriously to understand what's going on here bothers me more than a bit. I've had several times to correct radical misunderstandings perpetrated by him on various FR pages including one point where he changed the proposal page to reverse the original sense of a statement in an attempt to "clarify" it. Funnily enough, all his mistakes make the proposed implementation of FR sound like a disaster.

For the perplexed: keep in mind that there are three separate things:

the software FlaggedRevs, which is very flexible and could change the behaviour of the wiki in many different ways, including some which would make no apparent difference and some which would destroy wikipedia. No-one has thought through all these possibilities and no-one needs to; let's focus on the usable ones.
the particular implementation of FlaggedRevs under discussion, i.e. one choice of settings for all the switches on the machine, if you will. (Only the WMF devs can throw these switches for this site). It is absurd to suggest that HappyMelon doesn't know the technical specs, since he has set up a test wiki to demonstrate it, which Pmanderson won't try out. There are quite a lot of nice features which are much easier to understand if you try it out rather than trying to read a long set of "specifications".
the social reality of how our implementation is used in practice. This include the specs of each trial set out above (which in most cases need firming up). But also, each trial calls for at least a slightly different way of running the system, and we can't tell until we run it whether the community will follow these suggestions (or work out something better).

The final implementation would be different again, since we need to agree

What type of pages should be flagged (agreed: a small minority of all pages)
How much checking is required before flagging (agreed: at least for "obvious vandalism")
Who gets permission to flag (agreed, I think: many times more than just admins)
How permission is given out (ask for it, or automatically: the latter can be done in lots of different ways).

We might answer these differently for BLPs and for other pages where errors are less worrying. We might decide to use more features of the FR software than have been widely discussed so far (e.g. flagging does not have to be just a binary on/off). These would require some changes to the switches by the devs, but no new coding. The reason for doing the trials is to give us some empirical data to use in making these decisions (the answer might be "don't use it at all"). I can't understand why a community largely composed of geeks seems to contain so many who believe that armchair theorising is more reliable than experiment! PaddyLeahy (talk) 23:34, 26 January 2009 (UTC)[reply]

Then for Heaven's sake, put the meanings (as the software sets them) or the choice among meanings, out in Wikipedia space where we can all see them. Septentrionalis PMAnderson 23:40, 26 January 2009 (UTC)[reply]

I also see that Paddy Leahy is repeating Happy melon's insolent demand that I should reverse engineer the test model to get the basic description which any honest vendor would have supplied in the first place. Shame on both of you. Septentrionalis PMAnderson 00:48, 27 January 2009 (UTC)[reply]

Possible abuse

Perhaps I'm missing something here, but it seems to me that one thing that any trial will likely not catch is abuse of the system by biased "reviewers" that try to impose their POV by reverting edits with a POV that they disagree with. Potential abusers will likely wait until after the trial is over to start abusing the system. And then what? It doesn't look like there are any safeguards in this proposal to deal with such abuse. What am I missing? -- noosph e re 20:56, 26 January 2009 (UTC)[reply]

What you're missing is that people can do exactly now. Why would flagged revisions change that? ♫ Melodia Chaconne ♫ (talk) 21:53, 26 January 2009 (UTC)[reply]

Because with flagged revisions anyone who's not a "reviewer" will not be able to revert the reverter. Of course, the ability for anyone to revert reverters under the current system can lead to edit wars, which won't happen with flagged revisions (not so they're visible to non-logged-in users, anyway). Still, I am uncomfortable with the creation of yet another annointed class of users with more power than the rest of us have to shape the encyclopdia. -- noosph e re 22:09, 26 January 2009 (UTC)[reply]

Of course they can revert the reviewer. It'll be unflagged, but it's still possible, and WP:3RR and other such would still apply. ♫ Melodia Chaconne ♫ (talk) 23:20, 26 January 2009 (UTC)[reply]

And since the correction is unsighted, no one except reviewers will see it, and (since they are by hypothesis editing from a PoV) they will decline to sight the correction as erroneous. Septentrionalis PMAnderson 23:38, 26 January 2009 (UTC)[reply]

If non-reviewers can revert reviewers then I withdraw that particular objection. However, it still seems to me that per the proposal no information will make it in to the encyclopdia (the version that non-logged-in users can see) without the approval of the reviewers.

This is the cruicial difference between how the system works now and the proposed solution. Right now anyone's edits will be immediately visible, while the proposal will create an appointed subset of users who will act as a filter. And I don't see anything in the proposal that addresses possible abuses of this new power by this new class of users (or the abuse of the power to appoint reviewers). Please correct me if I'm wrong on these points. -- noosph e re 23:52, 26 January 2009 (UTC)[reply]

Nothing new here. Abuse of power by reviewers (people like you & me) can be dealt with by admins, by blocking or, more mildly, by revoking permission to review. Abuse by admins (e.g. biased reviewer appointment) is dealt with by WP:ARBCOM. Abuse by Arbcom can be dealt with by Jimbo (but we vote for them so we get what we deserve). Abuse by Jimbo can be dealt with by the WMF board. Abuse by the WMF board? Tough, they own the site. PaddyLeahy (talk) 00:20, 27 January 2009 (UTC)[reply]

Reviewers will not necessarily be "people like you and me". Not every editor will be a reviewer. Only a subset of editors that admins deem worthy will be given reviewer powers. Admins, arbcom, the WMF and Jimbo are no more immune from abusing their power than any other user, and with this change in policy would be given the power to appoint (and support) reviewers that further their own POV. While this proposal may have the effect of reducing vandalism and the like, it will also further concentrate power and make abuse of that power easier than under the present system. -- noosph e re 00:37, 27 January 2009 (UTC)[reply]

And this is a new power which a set of article owners can abuse; that's a cost of FR which must be considered. (Especially since the effect will be in the direction of making WP less accurate.) Septentrionalis PMAnderson 00:40, 27 January 2009 (UTC)[reply]

Noosphere, my point is that you can live with Wikipedia at present despite the potential for abuse. That doesn't really change. No-one is suggesting that reviewer rights be harder to get than rollback or qualification to vote for the WMF board or Arbcom; most proposed thresholds are lower. That would include you and me and users with much less experience than even me. PaddyLeahy (talk) 09:48, 27 January 2009 (UTC)[reply]

I did not see anything in this proposal that mentioned anything about which editors would be made reviewers. If I be assured that all legitimate Wikipedia editors would be made reviewers without discrimination then I'd feel a lot better about this proposal. Until then I'm afraid I'll have to voice my objection. -- noosph e re 17:04, 27 January 2009 (UTC)[reply]

True, the rule is TBD (although there is a lot of discussion in the FR archives) but you can get some idea from the proposed trials at the top of this page. Actually, for trials on a small sample of the intended final pages (e.g. trial 3), we should have a correspondingly smaller number of reviewers. For the final implementation, there is a strong case for giving the permission automatically to moderately-active users, as the software allows.

Also, a small clique couldn't dominate Wikipedia in the way you fear, because there wouldn't be enough of them to control more than a small number of pages. For reference, the Germans have 5800 "reviewers" ("sichters") and are barely keeping up with the backlog on 850,000 articles. Their wait time for reviewing edits by IPs is a few hours on average, with a long tail, which is not really good enough. PaddyLeahy (talk) 19:57, 27 January 2009 (UTC)[reply]

Remember the Wizard of Oz, who turned out to be just a little old guy hiding behind a curtain? This will be the FR reviewer - vetting things behind the scenes before they become visible on the main article page. Non-reviewers and IPs could theoretically edit unsighted edits, but the errors in these edits will not 'jump out' at them as they do currently - they'd have to seek them out, and most will not bother. This definitely changes the position that editors have equal rights to edit, and equal responsibility for what they do, it creates a class of people who are permitted behind the curtain, who become more powerful and less accountable. Riversider2008 (talk) 12:32, 27 January 2009 (UTC)[reply]

Yes this is a significant change. Yes, by definition reviewers would be more powerful. More so than rollbackers, much less so than admins. Then again, we seem to be talking about using this on less than one in seven pages, so my guess is that most IP editors won't even notice. But why would the reviewers be "less accountable"? I don't think admins would hesitate to revoke the right if they saw misuse—they are happy enough to block, which is much more aggressive. And just as for blocks, if an admin makes a mistake on this they can easily undo it. PaddyLeahy (talk) 19:57, 27 January 2009 (UTC)[reply]

Still just as accountable to admins, but less accountable to general readers, who see less of what they do behind that curtain. The strength, as well as the weakness of WP lies in the openness of its process Riversider (talk) 21:24, 27 January 2009 (UTC)[reply]

Well if its only "one in seven pages" all my concerns over FR vanish then. I don't care if its 1 in 7 or 1 in 70, if you can't show me it doesn't have a damaging effect, I don't want it. If you want to excuse it on this basis, I am not interested at all. MickMacNee (talk) 23:16, 27 January 2009 (UTC)[reply]

Trial 17: Variant of flag protection

Wikipedia:Flagged revisions/Trial/Proposed trials#Trial 17: Variant of flag protection

All users should see (and be able to edit) the live version of articles by default

Although I've followed stories regarding flagged revisions since the idea came about x years ago, but I've not been following the on-wiki developments. This may not be the best place to put my thoughts down, but given the discussion sprawl, I'm not sure where they should go.

I can see the benefits of flagged revisions, I think having articles qualified as quality is a good defense against attacks on Wikipedia's credibility and its worth. But it should not come at the sake of access. I first started editing Wikipedia in 2005, that my edits were instantly live was a big part of why I joined the project. This must be the same for many other users. I didn't need credentials, I didn't even have to log in. As Wikipedia grows, and there is still a lot of growing left to do, it must continue to attract new users. We must not raise a barrier to access.

The media have long used Wikipedia scapegoat stories when they have nothing else to write about. But it's changing, they, and the public understand the concept of Wikipedia. They're beginning to understand that it's not entirely accurate, that anyone could have authored it, and by in large, they're OK with it, and I'm OK with it. We should not be trying to turn this around by saying, "from now on, it will be accurate", because it won't. And when a sighted article inevitably does fall, it's going to look even uglier for us.

We must continue to attract new users, and we should become comfortable at Wikipedia's perception from the outside world. We shouldn't change that by introducing a delay between edit and publication. All users, anonymous IPs included, should be able to see and edit the live version of articles by default. This article should be marked as 'draft', or 'live'. But behind the scenes, we should allow trusted editors to flag articles as reviewed, there may be a review process where sources are verified and positions assessed (a WP:FAC-lite). When an article is reviewed, a 'reviewed' tab will appear alongside the 'edit this page' tab at the top of the page.

Readers should be aware that the article they're looking at by default is fluid, just like they should do now, and if they don't - we must continue to educate them. But having flagged revisions this way will also let them know if they want something slightly more reliable, we can provide that as well.

This is such a simple suggestion, it meets our goals, and I'm sure it has been made before - but looking at the trial proposals, they all seem to hide current revisions from the outside world. They raise the barrier for access, they say, "we don't trust you, you'll have to wait for someone on the inside", this is the wrong message to send out. And as we have to keep the turnaround on new edits as short as possible, it also introduces a lot of backend effort that would be better spent elsewhere. By flagging older versions of articles as 'reviewed', lessens the pressure on reviewers to assess every edit as they're made and decreases the backlog.

There are further thoughts in the press at The Guardian, The Spectator and The Independent. I apologise if this is the wrong place for ideas, if there's a better place for them to go, then let me know - discussion is welcome. - hahnch e n 03:05, 20 February 2009 (UTC)[reply]

Basicly the points that you are making are all part of the standard functionality of flagged revisions. As far as the ability to edit, it actually expands it by reducing the need to protect articles. There are links at the top of the article to jump between the stable version and the draft version. It provides an easier way for all users to view a stable version of an article, rather than having to sort through the history to manually determine what might questionable edits. Dbiel ^(Talk) 04:31, 20 February 2009 (UTC)[reply]

Flag protection, patrolled revisions and deferred revisions

I have made three proposals related to Flaggedrevs. The proposals are a variant of Flag protection, Patrolled revisions and Deferred revisions. They may be considered as trials, (see trial 17 above). Comments would be appreciated to see if there is support for some of them and what to do next. Cenarium (talk) 15:22, 22 February 2009 (UTC)[reply]

Getting this process back on track

Discussion on FlaggedRevisions has almost ground to a halt and is in any case getting nowhere. In order to get the process back on track, I propose we prepare a survey style straw poll. We would prepare a series of multiple choice questions addressing how we might implement a trial, including such issues as:

Whether or not we should have a trial at all?
If so, how long it should be?
What proportion and what kind of pages should be covered?
Who should be able to sight revisions?
If the answer to the last question is a new group of editors, how should they be selected?

(NB: These are just suggestions.)

Editors would then be able to express preferences on the possible answers given. This would help the achievement of a compromise as editors could express second and third preferred options in addition to the one they prefer the most. And just because an editor answered no to question no. 1, wouldn't mean they couldn't answer the other questions as well.

Whatever your opinion on FlaggedRevisions this survey would allow us to gain a clearer picture of where the community stand, and prompt more structured discussion. — Blue-Haired Lawyer 22:25, 7 March 2009 (UTC)[reply]

Rather than another, complex survey, I think the best step now would be to take what we've learned from all the discussions and the previous poll to try to hash out a single, conservative, well-defined first trial, and run a straw poll for that. This would probably look something like "Flagged protection", described in Trial 10. Replacing semi-protection with the more open system of flagging seems to be one use for the extension that almost all editors see as a good one; many editors don't want it to be limited to that ultimately, and some don't want it to go much beyond that, but a large super-majority, it seems, would find Flagged protection acceptable.

The last straw poll suffered from lack of clarity; it was unclear whether it was about Flagged Revs generally, or about merely a trial, and no precise trial protocol was specified. This one would have to be clear about the parameters of the proposed trial, about the fact that it is only a trial, and that it does not preclude either using Flagged Revisions in other ways in the future (if consensus forms for such uses) or turning the extension off if, at the conclusion, there is not a consensus to conduct other trials and/or continue with the type of flagging specified by the trial.--ragesoss (talk) 23:20, 8 March 2009 (UTC)[reply]

Yes, I agree fully with Ragesoss. The previous straw poll was vague, and at the end, while opinion was clearly weighted towards a trial, I don't think one could say that there was consensus: A lot of the people who voted seemed to interpret "flagged revisions" as either "My fantasy flagged revisions system" or "My nightmare flagged revisions system" depending on whether one was for or against. Taking a survey will be more of the same. It will be noise, not signal, and heat, not light. The right way to proceed is to make a specific and detailed proposal. It should include every circumstance and every contingency one can think of, and it should spell out in detail what would be done. It should include detailed information about what kinds of tests should be made, what sorts of data would be gathered, how that data would be interpreted, and what the terms of success are. Then people will know exactly what they are voting on, and the result will actually mean something. Ozob (talk) 18:43, 9 March 2009 (UTC)[reply]

Actually the previous poll was quite specific. The question (here) included a link to proposal page here) that even included what the php configuration would be if the poll succeeded. If you guys think you can skip a survey and find a solution which will find consensus, by all means do it! — Blue-Haired Lawyer 22:12, 9 March 2009 (UTC)[reply]

No, that was the only thing that it was specific on. It didn't specify who would receive reviewer status, how that status would be granted and revoked, what the criteria were for sighting revisions, who would receive surveyor status, how that status would be granted and revoked, what the criteria were for turning on flagged revisions, any trials to determine the effectiveness of the system, any metrics to determine the outcome of the trials, any criteria for evaluating those metrics and determining whether the trials were a success or a failure, or any process for determining whether to keep the proposed configuration in place or turn it off. That's why you could see some people endorsing the proposed configuration for BLPs only (like supports #159 and 161) or with specific conditions (supports #85 and #269), whereas other people endorsed a much tougher flagging system like de has (supports #72 and #152) or some other scenario where flagged revisions would be widely used (like support #50), and still others endorsed it as a substitute for turning off anon editing (support #251). It's that kind of variation that makes me say the previous proposal was vague—even though the proposed PHP configuration was not vague at all. Ozob (talk) 01:04, 10 March 2009 (UTC)[reply]

Wikipedia:Flagged protection, I think, might be a good conservative starting point...basically all it would need is another page (e.g., Wikipedia:Flagged protection/Trial) to specify the trial-specific details such as how long it would last and what would happen at the end of the trial, and describing the background as to why this particular implementation is being put forward for trial (i.e., the history of the earlier poll and Jimbo's intervention).

I actually started writing that page, but cleared it when I saw this. Jimbo is apparently going to moot a proposal soon. Which means waiting for that and starting from there might be the more productive way to go (or not).--ragesoss (talk) 17:37, 10 March 2009 (UTC)[reply]

(unindent) We need to show something very specific to the community, otherwise it'll fail like last time and the community would become tired of FlaggedRevs (already true to some extent). I'm about to propose Wikipedia:Flag protection and patrolled revisions for consideration to the community very soon. The original proposal of flagged protection has been quite opposed and thus this is a reworked version, plus a passive process allowing to monitor all blps, among others. Cenarium (talk) 17:53, 10 March 2009 (UTC)[reply]

There has been opposition to having flagged protection instead of other uses of flagged revisions, but I think an appropriately worded trial proposal for flagged protection that specified that this trial does not preclude other (possibly overlapping) trials of other additional uses of flagged revisions would be an acceptable start. I'm just worried that your combined proposal might be too complicated, and it probably still doesn't go far enough to appease many people who see flagged protection as doing too little. But it's a reasonable proposal and one I would be quite happy with if it can bring in enough support to actually move forward with a trial.--ragesoss (talk) 04:05, 11 March 2009 (UTC)[reply]

I've significantly modified the original Flagged protection proposal (see for example this discussion), and I think the concerns of those people will be sharply reduced. When we should use flagged protection is a policy matter, that is outside the scope of my proposal. Initially, I say that the requirements should be identical to current semi-protection, but this can evolve. Combining patrolled revisions and flagged protection is a necessity, as one is global, to allow a monitoring of all blps, and the other local, to protect certain blps regularly vandalized or subject to BLP violations. The issue had to be broken in two (arbitrary blp/problematic blp), one passive or active flag for both wouldn't find consensus and would be resp. too moderate or too extreme. Cenarium (talk) 18:12, 11 March 2009 (UTC)[reply]

Wikipedia:Flagged protection and patrolled revisions

It is proposed to run a trial of Flagged Revisions at Wikipedia:Flagged protection and patrolled revisions. The proposal is divided in two parts:

Flagged protection: an article can be 'protected' by an administrator so that the version viewed by readers by default is the latest flagged version. This is a modified version of the original flagged protection proposal.
Patrolled revisions: a 'passive' flag used to monitor articles, especially blps, for vandalism, blp violations, pov pushing, etc, that can be used for all articles, but has no effect on the version viewed by readers.

The proposals are independent but supplement each other. They involve the creation of a 'reviewer' usergroup. This implementation can support secondary trials. The main trial should run for two months, then a community discussion should decide the future of the implementation. Cenarium (talk) 22:43, 15 March 2009 (UTC)[reply]

A poll has started at Wikipedia talk:Flagged protection and patrolled revisions/Poll. Cenarium (talk) 18:33, 17 March 2009 (UTC)[reply]