I wrote this utility to extract the most value from two services I love dearly:
- Goodreads
- My local public library
Whenever I come across a title I'd like to read some day, I store it on my Goodreads shelf. When I'd like to visit my local library branch, I visit biblio.dcain.me to see which titles are available to be checked out.
My local library branch does not have the most extensive collection. Instead of meandering the stacks until I find a book I like, or fruitlessly querying the catalog to see if that interesting new book is on the shelf, I'd much rather have a script do the hard work for me.
The web interface currently supports Alameda County, San Francisco, and Seattle, but if you live near one of the ~190 public libraries using the BiblioCommons system, then running this software locally should work for you. It relies on undocumented APIs, so your mileage may vary.
You can also use just the Python backend locally.
- The
bibliophile
Python package does the legwork of querying Goodreads & BiblioCommons (respectively, these are the services needed to find which books I'm interested in, and which books are available at the library). - This repository defines some Lambda functions that are deployed to public
endpoints, accessible at api.dcain.me
- Function are configured with an API Gateway to enable a REST API.
serverless
provides automated deployment & configuration on AWS.
- A web app provides the user interface on biblio.dcain.me.
- Download Docker.
We choose to Dockerize the pip environment because both
grequests
andlxml
(dependencies ofbibliophile
) are C-based, and need to compile binaries for use on the Lambda VM. By using Docker, we can make use of an image that mirrors exactly what AWS will run in Lambda-land. - Create an IAM user for
serverless
with permissions to create CloudFormation stacks, S3 buckets, Lambda functions, and more. Standard practice withserverless
is to just grant the user an administrator policy, though this is not ideal security. - Create access keys for the
serverless
user - (one-time) create the
customDomain
that the API will be served on:(Note that new domains may take up to 40 minutes to initialize)npm run serverless create_domain
- Deploy the latest version of the endpoint:
npm run deploy
Once the above is done, a simple end-to-end test:
curl -X POST 'https://api.dcain.me/bibliophile/read_shelf' \
--header "Content-Type: application/json" \
--data '{"userId": "41926065", "shelf": "to-read"}'
Configuration for serverless deploy
is contained in serverless.yml
.
If you want to deploy this service to your own domain, you'll need to
tweak settings in there (namely, changing domain names).
This is a pet project I work on whenever I'm so inclined.
Accordingly, there are a lot of TODOs at any given moment...
Right now, this tool only reads the first 200 books on the to-read
shelf
(the Goodreads API prevents reading more). To support larger shelves, we should
call the API endpoint a few times (while not breaking the "1 query per second" rule
Goodreads has on API integrations.
After we've fetched the Goodreads shelf, we could make catalog queries, say, 20 books at a time. The backend already parallelizes execution of large searches, but this would allow us to have more iterative results which appear on-screen as data is available.
Currently, the search algorithm prefers an exact match on ISBN. This results in fewer results than you'd expect, since popular titles are generally released with several ISBNs (for example, paperback and hardcover editions get different ISBNs).
For famous authors/works, the number of incorrect results are just too numerous. Even when limiting to books (to exclude all the matching DVDs and audio recordings), other results appear alongside the "real" result.
An excellent way around this problem is to utilize Goodreads "other editions" feature:
- For each book, see if there's an exact match in the catalog. If so? Great, we're done.
- If there's no match, check for other editions' ISBNs. If no other editions, exit.
- Search by title & author, and accept any results which have a known ISBN.
An endpoint exists to do this (work.editions
), though it requires special permission.
I'm waiting to hear back from Goodreads staff.