An html scraper microservice based on x-ray & micro
Request
Send a GET
request to /scrape
endpoint with query string if:
- Scraping a text
Params | Required | Description |
---|---|---|
s-url | yes | destination website url to be scraped |
s-selector | yes | css selector of data to be extracted |
- Scraping multiple of data objects
Params | Required | Description |
---|---|---|
s-url | yes | destination website url to be scraped |
s-scope | yes | css selector of data's scope |
s-limit | no | limit number of objects returned |
[selector] | yes | css selector of each data to be extracted |
Response
A text or an array of objects in json whose keys are specified selectors in the request's query string.
Scraping Bitcoin price in USD from CoinMarketCap
- Request (uri encoded):
https://scraper.fun/scrape?s-url=https://coinmarketcap.com&s-selector=#id-bitcoin .price
- Response: as shown below
- Request (uri encoded):
https://scraper.fun/scrape?s-url=https://coinmarketcap.com&s-scope=table#currencies tbody tr&name=.currency-name .currency-name-container&price=.price&s-limit=3
- Response: as shown below
Make sure NodeJS (9.0.0 or newer), Yarn or NPM installed on your local machine. Then install project dependencies by running:
yarn
yarn start
The service will be up at 127.0.0.1:9500
by default
We use ESLint to lint source code. Simply run:
yarn test
By the command:
PORT=80 yarn serve
The app will be up at 127.0.0.1
You can use the existing docker image from https://hub.docker.com/r/phatpham9/scraper by running:
docker pull phatpham9/scraper
docker run -d -p 80:80 phatpham9/scraper
The app will be up at 127.0.0.1
CaptainDuckDuck is a nice heroku-liked tool to deploy your apps easily. You need to install CaptainDuckDuck client on your local, follow the instruction here to do it then run on your local:
captainduckduck deploy
That's it!
Click the below button to deploy to Heroku dyno
- Fork this repository to your own GitHub account and then clone it to your local device
- Follow the Development guide or just simply run:
yarn start
- Lint code by running: yarn test
- Create a pull request for us
- Phat Pham (@phatpham9)