Open Library provides dumps of all its data, generated every month. Most of the data dumps are formatted as tab separated files with the following columns:
-
type
- type of record (/type/edition, /type/work etc.)
-
key
- unique key of the record. (/books/OL1M etc.)
-
revision
- revision number of the record
-
last_modified
- last modified timestamp
-
JSON
- the complete record in JSON format
Dumps
-
editions dump (~ 9.2G)
-
works dump (~ 2.9G)
-
authors dump (~ 0.5G)
-
all types dump (~ 12.4G): includes editions, works, authors, redirects, etc.
-
complete dump (~ 29.6G): also includes past revisions of all the records in Open Library
-
ratings dump (~ 5M): with columns: "Work Key, Edition Key (optional), Rating, Date"
-
reading log dump (~ 65M): with columns "Work Key, Edition Key (optional), Shelf, Date"
-
redirects dump (~ 50M)
-
deletes dump (~ 75M)
-
lists dump (~ 30M)
-
other dump (~ 10M)
- covers metadata dump (~ 70M): with columns "id, width, height, created"
For past dumps, see: https://archive.org/details/ol_exports?sort=-publicdate
Downloading the dumps take too long? Checkout the link above and download via torrent for higher speeds!
Format of JSON records
A JSON schema for the various types is located at https://github.com/internetarchive/openlibrary-client/tree/master/olclient/schemata
-
Author Records: JSON serialization of a type/author
-
Edition Records: JSON serialization of a type/edition
- Work Records: JSON serialization of a type/work
Using Open Library Data Dumps
This guide by contributor on the LibrariesHacked
github about how to load Open Library's data dumps into postgres to make it more easily queriable:
https://github.com/LibrariesHacked/openlibrary-search
GraphQL
DiFronzo on github has produced a graphql proxy to search books using work, edition and ISBN with Open Library API. Deployed with Deno and GraphQL:
https://github.com/DiFronzo/OpenLibrary-GraphQL
DiFronzo/OpenLibrary-GraphQL
OL Covers Dump
We do not yet have rolling monthly dumps of our book covers, despite a shared desire for its existence. Some historical cover dumps may be explored here:
https://archive.org/details/ol_data?tab=collection&query=identifier:covers&sort=-addeddate
Most covers are archived in the following items. Note covers_0006
and covers_0007
are presently unavailable.
-
https://archive.org/details/covers_0000
-
https://archive.org/details/covers_0001
-
https://archive.org/details/covers_0002
-
https://archive.org/details/covers_0003
-
https://archive.org/details/covers_0004
-
https://archive.org/details/covers_0005
-
https://archive.org/details/covers_0008
-
https://archive.org/details/covers_0009
-
https://archive.org/details/covers_0010
-
https://archive.org/details/covers_0011
-
https://archive.org/details/covers_0012
-
https://archive.org/details/covers_0013
- https://archive.org/details/covers_0014
History
- Created December 14, 2011
- 34 revisions
August 7, 2024 | Edited by Drini | Fix dump sizes / instructions |
August 7, 2024 | Edited by Drini | New dumps are now available! |
June 8, 2024 | Edited by Mek | Edited without comment. |
June 8, 2024 | Edited by Mek | Edited without comment. |
December 14, 2011 | Created by Anand Chitipothu | Documented Open Library Data Dumps |