-
You need to install Python 3.x on your computer. Download the latest Python here.
-
Install python-pip tool following the instructions here.
-
Install the scrapy module with pip by running:
pip install scrapy
on Windows, or:python3 -m pip install --user scrapy
on linux.
If in case you have already done the scrapy preparation stuff, all that you need is clone the repository to your local disk, and type these commands on your cmd or bash console:
cd lagout
scrapy crawl lagout
where lagout is the task name of the crawl project.
Take a nap, or a cup of coffee, and enjoy above 60k free e-books. I leave the downloading jobs to yourself. You can just build a mirror site from the original one, or just download the resources by your needs, it's all up to you.
-
To start a new scrapy project, just run:
scrapy startproject lagout
-
Now we have a scrapy project called "lagout" which is built from the scrapy built-in templates, what we need to do is create a new file in the lagout/lagout/spiders directory, let's say "lagout_spider.py". In this file, write the crawler process according to your customized purposes.
-
For further knowledge, the latest scrapy documentation is here, and a handy scrapy tutorial is here.