Skip to content
This repository has been archived by the owner on Sep 29, 2019. It is now read-only.
/ DriveIt Public archive

A New Multithreading Crawler Supports Multiple Websites

License

Notifications You must be signed in to change notification settings

XIAZY/DriveIt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

If you just want to run this app, download complied executable files for Mac and Windows here!

Icon designed by Maxim Basinski, licensed by CC 3.0 BY.

DriveIt

DriveIt is a new crawler supports multiple websites, for now it supports

Overview

This project is still under development. More features will be added later.

Usage

Simply run it with Python 3. You may need to install some dependencies from PyPi. Make sure to install a JavaScript runtime before you start (like Node.js or Microsoft's JScript comes with Windows).

sudo pip3 install PyExecJS beautifulsoup4 requests

Then you should be able to run it happily. To simply start, type

python3 driveit.py <FlyleafURL>

Advanced usage:

usage: driveit.py [-h] [-l LATEST] [-t THREAD] url

A multithreading comic crawler.

positional arguments:
  url                   URL of the comic's cover page

optional arguments:
  -h, --help            show this help message and exit
  -l LATEST, --latest LATEST
                        Download latest x chapters from origin
  -t THREAD, --thread THREAD
                        Number of threads. Default to be 1

For example: eg

Or if you prefer GUI to CLI:

python3 driveit-gui.py

Note you need to have PyQt5 installed to use the GUI version. For Mac users, you can install it via

brew install pyqt5

For example: eg_gui

It can automatically creates subfolders followed by chapters, fetched picture will be stored in the proper location. For instance, chapter 1 page 1 will be stored in /name of the comic/Chapter 1/1.jpg.

Complied versions for Mac and Windows are available under Releases.

New websites can be easily supported. I'm now working on it.

By The Way

  • A flyleaf page means the index page of the comic. For instances:

  • Reading-driven development. Update frequency may be unstable depends on how far I read.

  • Note that the ck101 website is blocked in Mainland China. You may need a global VPN or Proxychains to fetch comics from it.

  • If you want to fetch comics from DMZJ, make sure the flyleaf address begins with www.dmzj.com instead of manhua.dmzj.com. The logic to fetch comics from these two domains are different.

  • Sometimes you will receive connection reset if you try to fetch comics from eHentai if you are in Mainland China. Use a global VPN or Proxychains instead.

  • Personally, I'll recommend you to fetch comics from DMZJ. For me this website is the fastest one.

License

Copyright 2016 XIAZY

Licensed under the WTFPL License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.wtfpl.net/

About

A New Multithreading Crawler Supports Multiple Websites

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages