This repository contains the web scraping component of the Career Craft project, designed to collect job listings from various company websites.
The Career Craft Scrapper is a Node.js application that automates the process of gathering job postings from different company career pages. It currently supports scraping from:
- PhonePe
- Flipkart
- Airbnb
- Spotify
- Mozilla
- Paytm
Before running the scrapper, ensure you have the following installed:
- Node.js (version 14 or higher)
- npm (usually comes with Node.js)
-
Clone the repository:
git clone https://github.com/The-Enthusiast-404/career-craft-scrapper.git cd career-craft-scrapper
-
Install dependencies:
npm install
The scrapper configuration is stored in config.js
. You can modify this file to add or update scraping targets and selectors.
To run the scrapper:
npm start
This will execute the main script (src/index.js
), which orchestrates the scraping process for all configured companies.
src/
: Contains the source codeindex.js
: Main entry pointscrapers/
: Individual scraper modules for each companyutils/
: Utility functions and helpers
config.js
: Configuration file for scraping targetspackage.json
: Project metadata and dependencies
- axios: For making HTTP requests
- cheerio: For parsing and manipulating HTML
- puppeteer: For browser automation and scraping dynamic content
- winston: For logging
Contributions to improve the scrapper or add support for new companies are welcome. Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature/your-feature-name
) - Make your changes
- Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin feature/your-feature-name
) - Create a new Pull Request
This project is licensed under the MIT License.