In progress development of an application to suggest books by international and diverse authors with similar themes to classic reading list books.
- Scraping book lists from Goodreads:
- scrape_goodreads_lists.sh is a bash script that calls a package to download book data from Goodreads based on Listopia lists (https://github.com/havanagrawal/GoodreadsScraper.git). The following lists are used:
minority and international lists:
young adult minority and international:
likely assigned books lists:
likely assigned young adult:
- Data processing
- process_data.ipynb contains a first look at the data from one of the lists. The descriptions have a problem where some lines are duplicated. I'm not sure if it makes sense to try to chop these off or just use unique words, or...?