User:Daniel Mietchen/Blog/2012/10/23/Reusing, revising, remixing and redistributing research

License

This page is licensed under the Creative Commons Attribution 3.0 Unported License.

About

This page hosts the draft for a blog post on Open Access Week.

Title

Reusing, revising, remixing and redistributing research

Introduction

The initial purpose of Open Access is to enable researchers to make use of information already known to science as part of the published literature. One way to do that systematically is to publish scientific works under open licenses, in particular the Creative Commons Attribution License that is compatible with the stipulations of the Budapest Open Access Initiative and used by many Open Access journals. It allows for any form of sharing of the materials by anyone for any purpose, provided that the original source and the licensing terms are shared alongside. This opens the door for the incorporation of materials from Open Access sources into a multitude of contexts both within and outside traditional academic publishing, including blogs and wikis.

Amongst the most active reusers of Open Access content are Wikimedia projects like the over 280 Wikipedia, Wikispecies and their shared media repository, Wikimedia Commons. In the following, a few examples of reusing, revising, remixing and redistributing Open Access materials in the context of Wikimedia projects shall be highlighted.

Reuse

An example for intensive reuse is an article from BMC Evolutionary Biology that features a number of phylogenetic trees of gastropods, along with pictorial depictions of individual species contained therein. Over a dozen of these depictions have been cropped from the figures and uploaded to Wikimedia Commons, from where they are currently being served to over 7000 pages across Wikimedia projects.

While these numbers far exceed the reuse of the average figure in scientific manuscripts, the potential for reuse has not been fully exploited yet. For instance, these phylogenetic trees have been published in a scalable format but some of the shell drawings have been included in bitmapped formats, which limits the size range at which the images can be re-used in Wikipedia articles. Furthermore, the trees have not been provided in an editable format, nor with code that could be used to reconstruct and adapt them.

A phylogenetic tree of some gastropods.
One of the tree's species, Fusiturris similis.
Composite images like this illustrated phylogenetic tree take a lot of effort to assemble. Typical reuse scenarios — e.g. in articles on the individual species — then require decomposition and are limited by the resolution of the original figures. Source:
Cunha, R. L.; Grande, C.; Zardoya, R. (2009). "Neogastropod phylogenetic relationships based on entire mitochondrial genomes". BMC Evolutionary Biology. 9: 210. doi:10.1186/1471-2148-9-210. PMC 2741453. PMID 19698157.{{cite journal}}: CS1 maint: unflagged free DOI (link) License: CC BY 2.0.

Of course, some images are genuinely created in bitmapped formats, e.g. photographs. But why then do publishers not preserve the EXIF information that could provide valuable context in interpreting images or sound files?

Revise

In January, a species of frog — Paedophryne amauensis — made headlines as the smallest known vertebrate. It belongs to a genus whose currently six species have all been described in Open Access articles, of which the two latest ones — published one month apart — both state that there are four species. So did a map provided in one of them, and a contributor to the Polish Wikipedia — Szczureq — took the initiative to update the map accordingly, which is currently in use in about 20 Wikipedia languages in articles related to the genus. The file has since been tagged for conversion into SVG, an editable vector graphics format, so as to facilitate future updates.

A map indicating the localities at which Paedophryne species have been found. On the left is the original map published in PLOS ONE with four species. On the right is a revision that takes into account the two additional species that had been published in ZooKeys a month earlier. Sources:

Remix

Scholarly communication nowadays takes place primarily in English. Open licenses allow for materials to be translated into other languages. This is particularly relevant for topics that are being taught in schools, such as the basic anatomy of the human ear and auditory cortex, as illustrated in the following figure originally published in PLOS Biology.

 
(A) The human ear and frequency mapping in the cochlea. (B) Lateral view of the human brain, with the auditory cortex exposed. Source:
Chittka, L.; Brockmann, A. (2005). "Perception Space—The Final Frontier". PLoS Biology. 3 (4): e137. doi:10.1371/journal.pbio.0030137. PMC 1074815. PMID 15819608.{{cite journal}}: CS1 maint: unflagged free DOI (link) License: CC BY 2.5.

Part A of the above figure has been converted to SVG and from there adapted to Czech (with a variant), German, Spanish, Indonesian, Japanese, Polish, Potuguese, Romanian and Ukrainian, with further versions being devoid of any descriptions, numbered descriptions or detailed frequency mapping.

Part B has also been converted to SVG and from there adapted for use in the Japanese Wikipedia's article on the insula.

Still in the auditory system and PLOS Biology, the next figure depicts some of the key processing steps involved in auditory perception:

 
Sound processing in the auditory system. Source:
Gollisch, T.; Herz, A. M. V. (2005). "Disentangling Sub-Millisecond Processes within an Auditory Transduction Chain". PLoS Biology. 3 (1): e8. doi:10.1371/journal.pbio.0030008. PMC 539322. PMID 15660161.{{cite journal}}: CS1 maint: unflagged free DOI (link) License: CC BY 2.5.

For this one, too, a SVG version has been created but the file has also been remixed in another way: User:Was a bee noticed that the depicted processing chain includes neither a sound source nor a mental representation of the perceived sound, and created a new version that does, which is used on both the Japanese and Italian Wikipedias.

Redistribute

Distribution of the published literature traditionally takes place on the level of individual articles, journals or publishers but open licenses allow it to be aggregated at a cross-publisher level. For instance, the Open Access Subset at PubMed Central is now being automatically spidered for articles that (1) are licensed compatibly with reuse on Wikimedia platforms and (2) contain audio or video files. If such files have been detected, they will be downloaded from PubMed Central, converted to the open format OGG and uploaded to Wikimedia Commons, along with the accompanying metadata and suggested categories based on the article's XML and the corresponding MeSH terms. Of course, the naming of these article-derived categories does not map one to one to categories used at Wikimedia Commons, but for known correspondences, there is another bot to fix that in a way that makes it easy for writers of Wikipedia articles to find relevant materials for illustration. This way, supplementary files — that otherwise are often neglected and rarely accessed — can live a second life in a new context. One of them, for instance, is featured on the Main Page of Wikimedia Commons today. The most recently uploaded media files from Open Access supplements can be viewed in a dedicated gallery.

In the process of setting up the bot, it became very clear that the XML supplied to PubMed Central varies widely in terms of compliance with PubMed Central guidelines and general machine readability. For instance, the XML indicates the MIME type of the supplementary files, but for about ten percent of the files, this type is indicated wrongly (e.g. for all videos in this paper), and even the licensing and copyright statements of the articles themselves are sometimes contradictory in themselves, so work remains to be done to address these issues and to further standardize the exchange of metadata.

It is interesting to note that the only permission that had to be sought in order to run the import from PubMed Central into Wikimedia Commons was actually on the Commons end, since running a bot there requires approval, which is normally granted after the bot has demonstrated compliance with relevant policies and standards. There is a caveat to such large-scale import, however: it relies on proper assertion of copyright and correct indication of licensing back at the journals and, ultimately, by the authors of the corresponding articles. This is not a given, since many scholarly authors are still far from being familiar with these legal aspects of publishing. Raising awareness of such issues amongst the scholarly, librarian and publishing communities is one of the purposes of Open Access Week, and trying to get Open Access materials used on Wikipedia (or simply checking the provenance of an image or media file used there) is a good start to familiarize oneself with the subject.

Wikifying publications

Reusing, revising, remixing and redistributing openly licensed content is easier if the materials are created in an editable fashion right from the start. The journal RNA Biology has for several years required that authors of manuscripts describing new families of RNA submit a draft for a Wikipedia entry along with their manuscript, which will go through the same peer-review process. Earlier this year, PLOS Computational Biology has taken this approach a step further by introducing Topic Pages — review articles drafted according to the guidelines of the journal and of the English Wikipedia — that are published as traditional non-editable documents in the journal and additionally also posted to the English Wikipedia, where they can be expanded and updated as the need arises. It would be nice to see further experimentation in this area, so as to increasingly integrate scholarly workflows with the Web, for which Open Access provides the first step.