An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
Maintained by Zyte and many other contributors
Terminal•
pip install scrapy cat > myspider.py <<EOF EOF scrapy runspider myspider.py
Build and run your
web spiders
Terminal•
pip install shub shub login Insert your Zyte Scrapy Cloud API Key: <API_KEY> # Deploy the spider to Zyte Scrapy Cloud shub deploy # Schedule the spider for execution shub schedule blogspider Spider blogspider scheduled, watch it running here: https://app.zyte.com/p/26731/job/1/8 # Retrieve the scraped data shub items 26731/1/8
Deploy them to
Zyte Scrapy Cloud
or use Scrapyd to host the spiders on your own server
Fast and powerful
write the rules to extract the data and let Scrapy do the rest
Easily extensible
extensible by design, plug new functionality easily without having to touch the core
Portable, Python
written in Python and runs on Linux, Windows, Mac and BSD
Healthy community
- - 43,100 stars, 9,600 forks and 1,800 watchers on GitHub
- - 5.500 followers on Twitter
- - 18,000 questions on StackOverflow