Page MenuHomePhabricator

Make it easy to fork, branch, and merge pages (or more)
Open, Needs TriagePublic

Assigned To
None
Authored By
cscott
Sep 17 2015, 11:35 PM
Referenced Files
None
Tokens
"Pterodactyl" token, awarded by Sj."Piece of Eight" token, awarded by ToBeFree."Like" token, awarded by Liuxinyu970226."Love" token, awarded by MGChecker."Like" token, awarded by fbstj."Like" token, awarded by He7d3r."Mountain of Wealth" token, awarded by nshahquinn-wmf."Like" token, awarded by JanZerebecki.

Description

It used to be conventional wisdom that forking was the death of an open source project. We all remembered the emacs -vs- xemacs -vs- Lucid emacs wars, nobody wanted to repeat that. So we took great care to keep our repos centralized and singular.

Then git arrived, and shortly after, github. Suddenly, forking wasn't evil! Instead, creating a fork was the *very first* thing that you did when you wanted to contribute.

There are a lot of benefits to the forking model. Nobody has to ask permission! Instead of "they reverted my edit!" new editors would instead just see "they didn't immediately merge my edit" -- which is less immediately offputting, and allows for additional refinement of the contribution before merge. And further, groups can potentially build a collection of articles over a long period of time, without the need to make their initial work immediately public. (The "diff" and "merge" steps are just as important as the "fork", in order to make this model work well!)

There are number of ways to experiment with "fork-and-merge" models for editing wikimedia projects. Some concrete suggestions are fleshed out below. Add your own, let's discuss, and we can figure out the best way to start experimenting with fork-and-merge.

Related tasks (thanks, @Tgr & @awight): T108664: Provide an interactive edit conflict resolution tool; T26617: Implement diff on sentence level (instead of per paragraph block); T91137: RFC: Support a branching content history model; T40795: History should support branches (at this revision there was a merge/split with that revision").

This is also mentioned on the 2015 Community Wishlist Survey: Support for version branching for pages (where considerable opposition to the idea was gained).

SUMMIT GOALS

  • The primary goal of the summit is to agree on an actionable "next step" (or "steps", if we're ambitious), so that work on improved revision models doesn't continue to stagnate. We should leave the summit with at least one implementable feature or proof-of-concept which will advance or inform the broad goal and can be implemented before the next dev summit. Some concrete suggestions raised in this thread include:
  • A UX roadmap. "How should users experience branches/forks/merges?" Since this impacts the community, I expect this to be a set of *experiments* rather than a fixed set of UX decisions. For example:
    • Prototype lightweight branches with a JavaScript gadget that forks the page into user space, deploy it on mediawiki.org, and get user feedback.
    • Have our design team mock up user-facing branches from T91137 (see wireframes at https://www.mediawiki.org/wiki/Requests_for_comment/Branching) into a form suitable for public comment.
    • Re-envision UX for a "saved drafts" feature, which might use branching support semi-invisibly as an implementation mechanism. Perhaps mobile could be the guinea pig here, letting mobile users save edits-in-progress and continue them on their desktop devices.
  • A final goal should be broad agreement on a technical roadmap. How are branched revisions going to be stored in core; how that in teracts with RESTBase, etc; how to represent a branching revision history. This roadmaps shouldn't dive into UX considerations or overly-specific implementation details (those are part of the "concrete next step" planning, if necessary) but we should have an envelope-sketch plan that we all agree makes sense, and which should help guide future RFC discussions on specific implementation proposals.

Note that the technical and UX roadmaps may be initially somewhat at-odds, as @Pginer-WMF rightfully points out in the comments below. The goal of this summit is just to write a blind first draft for both, so that we can then start a dialog between them. Otherwise both design and implementation get hung up waiting for the other. We don't expect to actually implement the drafts as-is, but we *will* publish and talk about them with the community and start working on reconciliation.

This card tracks a proposal from the 2015 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey

This proposal received 1 support votes, and was ranked #99 out of 107 proposals. https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Miscellaneous#Support_for_version_branching_for_pages

Related Objects

Mentioned In
T71445: Implement a proper code-review process for MediaWiki JS/CSS pages on Wikimedia sites
T238383: Wikimedia Technical Conference 2019 Unconference: Federated MediaWiki
T216112: Support data sharing in complex networks of MediaWiki wikis
T20493: RFC: Unify the various deletion systems
T149664: Next steps for edit review
T120462: Reduce edit conflicts by treating different parts of the page as separate
T120500: Support for version branching for pages
T119593: Define the list of "must have" sessions for WikiDev '16
T119030: WikiDev 16 working area: Collaboration
T119162: WikiDev 16 working area: User interface presentation
T119018: Working groups/areas for macro-organization of RfCs for the summit
T119029: WikiDev 16 working area: Content access and APIs
T40795: History should support branches (at this revision there was a merge/split with that revision")
T106898: Offline editing to support people with intermittent or no internet access (e.g. Kiwix, mobile)
T96903: Identify and prioritize architectural challenges
Mentioned Here
T228575: Decrease number of open tickets with assignee field set for more than two years (aka cookie licking) (March-June 2020 edition)
T105173: HTML diffs of edits for everything
T107595: [RFC] Multi-Content Revisions
T91137: RFC: Support a branching content history model
T106898: Offline editing to support people with intermittent or no internet access (e.g. Kiwix, mobile)
T26617: Implement diff on sentence level (instead of per paragraph block)
T108664: Provide an interactive edit conflict resolution tool
T40642: Add configuration setting to add a preference checkbox for sending users copies of pages on their watchlists that are deleted
T40795: History should support branches (at this revision there was a merge/split with that revision")

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

a working area for each page with pieces that editors can easily move into the page (merge) or back to the working area

Like a wikitext talk page? :)

Yes, but...

A talk page is associated with the whole content page. Users participating in them need to describe to which piece of content they are referring to when proposing an improvement. This indirection gap requires an effort for those creating proposals and those trying to understand how those proposals map to the article content. For example, collecting all proposals ever made in a talk page about a given article section is not an easy task in the current granularity level.

So I agree that talk pages are currently used for this purpose (and many more due to the flexibility a blank page provides), but the fact that people using them refer to pieces of content also suggests that providing tools for operating at that level may be useful for some use cases. Having said that, this is not intended to replace a global view for the article which is still relevant in many ways. So we'll need to consider how those views should work together when exploring these possibilities.

The "shouldn't dive into UX" is *only* for the tertiary goal, and is just to try to unblock it and determine if we can achieve technical consensus on broad-stroke implementation details.

Ok. What I was trying to convey is that the technical direction will be affected by the answer of "How should users experience branches/forks/merges?" since it may lead to many different possibilities (with many different technical implications).

But (to play devil's advocate) when git was designed linus specifically *rejected* fine-grained tracking of this sort, claiming that any need for such information can just as efficiently be generated after-the-fact.

My main concern is which building blocks (i.e., concepts, operations, etc.) are the most adequate for users in our context.

Let's imagine that facilitating the revision of content at the paragraph level is a good idea. In that case, I'm interested in how the model is presented to the users, and I'm fine with any underlying implementation to support it (storing paragraph ids or getting them after the fact). The key question is whether limiting a change to the scope of a paragraph is beneficial to focus the collaboration or it prevents it by fragmenting potential broader changes to the whole page.

I think Git is a good example of a very powerful tool that is not very intuitive (and even alternative terminology has been proposed to try to add some clarity). That could be a result of its general purpose nature, and that is why I think that understanding what does versioning mean in our context is key to provide users with the right tools so that they don't feel that they have been given a powerful but unintelligible alien time machine.

@Pginer-WMF I agree with your points. I still think its worthwhile to discuss the implementation landscape so we have a rough idea what sorts of things are "easy" and which are "hard" (or even "impossible"). Hopefully the result will be a dialog between design saying "here's what we want" and implementation saying "here's what we can do" and then we find some middle ground. But we need to start the discussion on both sides first.

A different proposal on the 2015 Community Wishlist Survey is "Allow copy of pages", which is really talking about forking books AFAICT. Better fork/join mechanisms might be useful here, especially if the "forked" content could live in its own namespace.

Github allows this, for instance: I can fork project "foo/bar" into "cscott/some-other-project" and still preserve the history and linkage back to the original "foo/bar".

As I understand the user, they want to start a book on "Java 8" (say) starting from the book on "Java 7". Ideally we wouldn't lose the links back to the original book, in case there were errors fixed which would be common to both versions of the book.

I think one way of thinking about this is to build a list of viable approaches that we would like proof-of-concept implementations of. The prep work for this session could be to build a clear list of these, and also try to find people who offer to build (or lead building) a proof of concept for that approach. The meeting itself could be to take the amount of time, divide a block of it evenly(?) between the different options, and then let the people planning to build proof of concept implementations explain how they plan to do it. If no one offers to build it by/at the summit, and no one is excited to build it, then it gets removed from the list of considered options.

Does that seem like a reasonable approach?

The top ask from the dewiki/WMDE community wishlist is better edit conflict resolution, which has a lot of overlap with this (an edit conflict being basically a mini fork/merge).

I also think that a more concrete proposal is needed for the summit, considering the difficulty of the topic.

In the past, we have discussed HTML diffs like those on localwiki. Those are desirable for VisualEditor users in general, and would provide a fairly intuitive UX basis for an interactive merge tool.

In the past, we have discussed HTML diffs like those on localwiki. Those are desirable for VisualEditor users in general, and would provide a fairly intuitive UX basis for an interactive merge tool.

Add T105173 to blockers if you think that's closely related?

We discussed this at an unconference session at the WMF All Hands meeting. One of us will need to publish the notes (assuming they are useful, which I hope they are)

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

IMPORTANT: If you are a community developer interested in working on this task: The Wikimedia Hackathon 2016 (Jerusalem, March 31 - April 3) focuses on #Community-Wishlist-Survey projects. There is some budget for sponsoring volunteer developers. THE DEADLINE TO REQUEST TRAVEL SPONSORSHIP IS TODAY, JANUARY 21. Exceptions can be made for developers focusing on Community Wishlist projects until the end of Sunday 24, but not beyond. If you or someone you know is interested, please REGISTER NOW.

Restating the consensus from the meeting cited above: the suggestion was made that I prototype these tools as part of a "better support for the Draft namespace" task, before making more fundamental changes to article editing or mediawiki. The goal would be to create a set of tools (gadgets, maybe an extension) that would allow one-button "fork to Draft namespace" and "merge from Draft namespace" with a similar UX to how user-friendly fork/merge works in github. Then additional tools can be built to manage edit conflicts between the copy stored in the draft namespace and the current "master".

Apologies if my terminology is a little bit vague; I haven't spent much time playing around with the Draft namespace yet, so this is just my recap of the approach more experienced editors suggested. Help eagerly accepted! Otherwise this is on my list of "big projects to do before the year is up".

Apologies if my terminology is a little bit vague; I haven't spent much time playing around with the Draft namespace yet, so this is just my recap of the approach more experienced editors suggested. Help eagerly accepted! Otherwise this is on my list of "big projects to do before the year is up".

That makes sense. Currently, the Draft namespace is primarily oriented around new articles that aren't ready for mainspace. However, I think the idea to also use it for a fork/merge workflow is interesting and worth considering.

Some more notes from a twitter discussion started by @Sj, whose developer summit position statement is strongly related to this task.
@cscott says:

I think that's my cue to pitch T113004 which seeks to incorporate a more GitHub-like fork/merge model in Wikipedia. It would allow contributions to be visible, even when unmerged. (Which has its own issues, of course...)

@Phoebe says:

What is an unmerged contrib? Someone makes an edit, someone takes it out again? Or are you imagining that ppl don't contribute directly to article, they propose a change a la GitHub and others can see that change? Wouldn't that take a huge amount of personhours to approve pulls?
I'm sure ideas about this are in the proposal somewhere but as you know the vast amount of edits are added w/o controversy, so keeping workloads the same or lower would be A
Ps I'm reading all the stuff about enhancing diffs & edit review screens - that's fantastic. Though I'll keep my "diff to penultimate revision" t-shirt forever, diffing could really use some love <3

@cscott says:

Once the basic infrastructure for forked revisions was in place, you could experiment with lots of merge strategies. Automatically merge after N hrs, say, or merge edits scored as not-vandalism...
I think the least disruptive thing would be to automatically merge, but keep the fork when there is a roll back or merge conflict. You could choose to start a revision as a fork as well, like w/ the draft namespace.
The key psychological breakthrough would be to change first-time editors' experience from "I tried to help and it was reverted and thrown away" to "I tried to help and it wasn't merged yet but there were some helpful suggestions for improvements".

I'll add: the happiness kick you see when your constructive edit is made and immediately goes live on the site is even better! And should definitely be preserved whenever possible. But fork/merge can provide gentler off-ramps other than revert in case of issues with an edit.

A former WP editor complained about persistent right-wing bias they perceived on the site.
@cscott replied:

the point of the fork/join discussion is to let you write (in your case) a more left-wing slant without it being immediately reverted in an edit war. Hopefully first-class revisable forks of an article can enable a more deliberate debate, rather than immediate-effect edit wars.
And then you could be pointing to specific unmerged revisions which you think demonstrate right-wing bias, and we could be collaborating to validate/edit/merge them. (You can do this now to a degree with reverted edits, but it gets cumbersome.)

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Perhaps we should coin this "MergeRefork" to mirror MapReduce, so this is clearly distinguished from other approaches to edit conflicts.