Research:Understanding Engagement with Images in Wikipedia
In this project, we study how readers engage with images in Wikipedia. Our aim is to leverage the server logs to provide a quantitative description of readers interaction with multimedia content in Wikipedia articles, in particular with images. We will break down our analysis by several dimensions: we will explore how page and image types impact on readers engagement, and how images are accessed from different geographic areas. This project will be kicked-off as part of a 12-week internship at Wikimedia Research.
Research questions
[edit]How do we measure engagement with images? Given data availability and literature, our first goal is to find a few metrics that can be useful to measure engagement. A good candidate is the number of pageviews that convert to click on images.
To what extent are readers engaging with images, and which images tend to be more engaging? Here, we will perform a large scale analysis of how readers engage with images. We will break down this analysis by page and image types. Computer vision technology will be employed to classify images into topics. Here we want to answer questions like: which % of readers engage with images? are certain topics or image types (quality, subjects) more engaging for readers? Are visual factors such as image quality impacting readers' engagement with images? Are article factors such as article "completeness" impacting reader's engagement with images?
Are readers from certain locations/language communities more prone to engage with images? Here, we want to get deeper into the role of language and location for visual content engagement. We will perform an analysis of the location of clicks VS language edition. We want to see here whether images are accessed prominently by e.g. non-native speakers or people coming from certain geographic areas.
Are image useful to increase (new)readers engagement? Lastly, we want to run an experiment to see how comparable articles with/without images impact engagement metrics such as dwell time, session length, visit frequency.
Data
[edit]We base our study on the server logs available in the webrequest logs table from which we collect pageviews and imageviews for each reading session. We identify reading sessions by concatenating client_ip user_agent.
Results
[edit]First round of analysis
[edit]The first round of data analysis has been performed in May-July 2020. We started a quantitative analysis of how readers engage with images in Wikipedia. To do so, we first defined two key metrics of readers engagement: the page-specific click-through rate and the image-specific click-through rate. We computed these metrics after collecting two weeks of data for four Wikipedia language editions (English, French, Spanish, and Arabic), and breaking down our analysis by several dimensions: country, topic, and access method (desktop or mobile web).
Main findings:
[edit]- The average page-specific click-through rate shows a weekly pattern with an increased probability of clicking on images over weekends with respect to weekdays. Moreover, it is 3.5% for English, 3.7% for French, 2.9% Spanish, and 2.2% for Arabic Wikipedia. For English Wikipedia, it is ten times higher than for citations[1];
- The Main Page plays an important role in increasing image views: images placed on the Main Page are viewed 60 times more on average than the rest of the images;
- There are significant differences in the way readers engage with images based on the topic of interest.
More details on this first round of analysis here.
Second round of analysis
[edit]In our second round of analysis, we analyze the page loads, image clicks, and page previews collected during four weeks in March 2021. We mainly focus on three key metrics: the global click-through rate, the image-specific click-through rate, and the conversion rate. We try to dive deeper into some of the results found previously by (i) quantifying reader engagement with images, (ii) exploring the main drivers of such engagement, and (iii) investigating image support on readers' need for additional information when navigating Wikipedia.
Main findings:
[edit]- The global click-through rate is 5%, meaning that on average 1 in 29 page loads results in a click of at least an image. Notably, images appear to drive significantly more engagement than citations on Wikipedia articles;
- Clicks on images occur more often in shorter and unpopular articles, articles about visual art and transportation, and biographies of less well-known people;
- Clicks on illustrated page previews occur less often than on illustrated ones, indicating that images may support the reader's information needs when browsing Wikipedia.
More details here.
A paper with all the results was published in January 2022.
On Phabricator
[edit]Follow this task.