Skip to content

phatpham9/scraper

Repository files navigation

scraper

An html scraper microservice based on x-ray & micro

Package Version Travis David David Dev

Features

  • x-ray: An html scraper
  • micro: Asynchronous HTTP microservices
  • joi: Object schema validation

Usage

Request

Send a GET request to /scrape endpoint with query string if:

  1. Scraping a text
Params Required Description
s-url yes destination website url to be scraped
s-selector yes css selector of data to be extracted
  1. Scraping multiple of data objects
Params Required Description
s-url yes destination website url to be scraped
s-scope yes css selector of data's scope
s-limit no limit number of objects returned
[selector] yes css selector of each data to be extracted

Response

A text or an array of objects in json whose keys are specified selectors in the request's query string.

Examples

Scraping Bitcoin price in USD from CoinMarketCap

  • Request (uri encoded): https://scraper.fun/scrape?s-url=https://coinmarketcap.com&s-selector=#id-bitcoin .price
  • Response: as shown below

Scraping top 3 coins' price

  • Request (uri encoded): https://scraper.fun/scrape?s-url=https://coinmarketcap.com&s-scope=table#currencies tbody tr&name=.currency-name .currency-name-container&price=.price&s-limit=3
  • Response: as shown below

Development & deployment guide

Getting started

Make sure NodeJS (9.0.0 or newer), Yarn or NPM installed on your local machine. Then install project dependencies by running:

yarn

Start developing

yarn start

The service will be up at 127.0.0.1:9500 by default

Testing

We use ESLint to lint source code. Simply run:

yarn test

Running in production mode

By the command:

PORT=80 yarn serve

The app will be up at 127.0.0.1

Deploy using Docker

You can use the existing docker image from https://hub.docker.com/r/phatpham9/scraper by running:

docker pull phatpham9/scraper
docker run -d -p 80:80 phatpham9/scraper

The app will be up at 127.0.0.1

Deploy to CaptainDuckDuck

CaptainDuckDuck is a nice heroku-liked tool to deploy your apps easily. You need to install CaptainDuckDuck client on your local, follow the instruction here to do it then run on your local:

captainduckduck deploy

That's it!

Deploy to Heroku

Click the below button to deploy to Heroku dyno

Deploy

Contributing

  1. Fork this repository to your own GitHub account and then clone it to your local device
  2. Follow the Development guide or just simply run: yarn start
  3. Lint code by running: yarn test
  4. Create a pull request for us

Contributing