Skip to content

TimeMagazine/wikipedia-rankings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wikipedia-rankings

Support files for TIME's ranking of the prominent people on Wikipedia.

Data was collected over several days in May using node-wikipedia, a Node.js module maintained by @wilson428.

We considered eight data points for each entry:

  • Number of words
  • Number of links to other Wikipedia pages
  • Number of external links (which are typically references)
  • Number of categories the person is in
  • Total number of revisions to the page
  • Number of unique individuals who have edited the page as a signed-in editors
  • Number of anonymous edits
  • Number of vandalisms, as identified in editing notes

Data for the top 100,000-or-so people is available as a 15MB CSV file.

Analysis

Using out-of-the-box R functions, we reduced these eight variables to their principal components (using this handy guide). As you can see, a huge amount of the variance is contained in the first PC:

variance

You can rerun the principal component analysis like so:

RScript wikipedia.r

(This may require installing the relevant libraries first).

By trial and error, the ranking that most satisfied our anecdotal sense for "influence" in the real world was PC1 PC2, which becomes the score for each person.

About

Measuring the most prominent people on Wikipedia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages