Page MenuHomePhabricator

List bugs by number of duplicates
Closed, DeclinedPublic

Description

Listing bugs by number of duplicates is very important to identify issues and features which were a big focus of development and discussions. How do I do this in phabricator?

Compare:


Might depend on T883 if that bug means Phabricator doesn't *know* the list of merged/duplicated tasks.

Event Timeline

Nemo_bis raised the priority of this task from to Needs Triage.
Nemo_bis updated the task description. (Show Details)
Nemo_bis changed Security from none to None.
Nemo_bis subscribed.
Qgil triaged this task as Lowest priority.Oct 26 2014, 5:36 AM
Qgil added a project: Phabricator (Upstream).

issues which were a big focus of development and discussions.

Wouldn't a criterion to identify focus of discussions rather be number of comments or users in the CC field?
If you're after identifying popular tickets, I'm not aware of any convincing way either combining {duplicates×number of comments×CC field entries×votes} in Bugzilla or such.
For example, the sheer number of duplicates does not take into account the knowledge of a duplicate bug reporter to search existing tickets in Bugzilla and using the right words when searching. So I generally doubt the usefulness of duplicate statistics and I'm happy to link to a bunch of scientific papers on this topic if wanted. :)

How do I do this in phabricator?

AFAIK you cannot, plus "merging" (how it's mostly called in Phab) might imply slightly different semantics.

Wouldn't a criterion to identify focus of discussions rather be number of comments or users in the CC field?

Sure; is that possible? Please file separate tasks.

So I generally doubt the usefulness of duplicate statistics and I'm happy to link to a bunch of scientific papers on this topic if wanted. :)

Papers appreciated. Experience shows much-duplicated bugs in mediazilla are actually important.

@Nemo_bis: We might want to take further discussion off-Phab, but here's some quotes that made me question that "this ticket has X dups" is actually helpful information. I do realize that these quotes focus on the tech-savvyness of the community which is highly interpretable, so it was good that you made me look this up actually. I'm slightly less sure now. ;)
I linked to URLs when I'm aware that papers are freely available.

  • Sporadic reporters are more prone to create duplicate reports due to not being experienced (page 56-57 of Yguaratã Cerqueira Cavalcanti, Paulo Anselmo Mota Silveira Neto, Daniel Lucrédio, Tassio Vale, Eduardo Santana Almeida, and Silvio Romero Lemos Meira. The bug report duplication problem: an exploratory study. Software Quality Control, 21(1):39–66, March 2013).
  • Many users only ever report a single issue (page 65 of http://dirkriehle.com/uploads/byhand/theses/2013/capraro_2013_arbeit.pdf )
  • Reporters themselves might not have sufficient knowledge of the database and might not find an already existing report (page 85 of Kevin Crowston. The bug fixing process in proprietary and free/libre open source software: A coordination theory analysis. In Varun Grover and M. Lynne Markus, editors, Business Process Transformation, pages 69-99. M. E. Sharpe, Armonk, NY, 2008.).
  • There are contradictive research statements whether consumer-oriented projects face a larger number of dups or not (page 16 of http://hdl.handle.net/1957/28309 vs. pages 266-267 of Yguaratã Cerqueira Cavalcanti, Eduardo Santana de Almeida, Carlos Eduardo Albuquerque da Cunha, Daniel Lucrédio, and Silvio Romero de Lemos Meira. An initial study on the bug report duplication problem. In Proceedings of the 14th European Conference on Software Maintenance and Reengineering, pages 264–267, March 2010).
  • „few cases was the presence of duplicates used to advocate for a particular decision.“ (page 7 of http://dub.washington.edu/djangosite/media/papers/tmpj4tp8P.pdf ).
  • And on a personal note, when I talked with Gentoo folks four months ago about their bug management in their Bugzilla and how to identify popularity, they described that the creation of more duplicates and votes on a bug report help them to identify inactivity actually.

Regarding https://old-bugzilla.wikimedia.org/duplicates.cgi that one had its flaws (e.g. not including dups of dups in the statistics) but it provided some vague idea at least (though I still don't believe that "number of dups" is relevant enough on its own to base decisions on).

Very interesting collection of information (we should copy it on wiki), but yes it doesn't prove your thesis. :) It might even bring us to the opposite conclusion: if newbies report many duplicates, and we want to help new editors, then duplicated bugs are the most important. ;)

This is such a corner case. Also, what https://old-bugzilla.wikimedia.org/duplicates.cgi seems to show is very very old bugs. The number of dupes might be more related with how many years a bug has been around (and forgotten) than about the relevance of such bug.

I propose to decline this task.

Qgil claimed this task.