Skip to content

Web scraper to get information about posted jobs in the US from Indeed.com

Notifications You must be signed in to change notification settings

PSavvateev/JobScrapingApp_Indeed.com

Repository files navigation

Indeed.com Jobs Scraping App

Overview

Program to scrape and store posted jobs in the United States from www.indeed.com

Gets the next information from the website:

  • original id generated by Indeed;
  • job title (job_title)
  • posting date (job_date)
  • location (job_loc)
  • short description (job_summary)
  • salary (or salary range) in a list format (job_salary)
  • url of the job (job_url)
  • company name (company_name)

Getting Started

  1. Install all required packages from requirements.txt.
    $ pip install -r requirements.txt

How to use

  1. Assign search parameters in the parameters.py:
  • positions should be a list of strings with all positions names or key-words for search. Even if there is one word, keep it in the list: positions = ["auditor"]
  1. Run the app.py
    $ python3 app.py

Functionality:

  1. Scraping jobs by the key parameters: search key-words
  2. Cleaning / formatting data.
  3. Each scraping session saves the results as a csv data dump to the data_dumps/ folder.
  4. Each step of the scraping is logged into the log.txt with printing the outcomes in the console.

Architecture:

  1. app.py - enter point
  2. main.py - the main workflow of the program
  3. indeed_com_scraper.py - scraping functionality module
  4. dumping.py - data cleaning / formatting module saving data dumps
  5. logger.py - logging functionality
  6. parameters.py - keeping scraping parameters in separate module for easy access.

Additional:

  1. db_scheme.py or db_scheme.sql for initial database setup.
  2. requirements.txt required python packages.

Requirements:

python 3

Packages:

  • pandas 1.4.2
  • requests 2.28.0
  • beautifulsoup4 4.11.1

About

Web scraper to get information about posted jobs in the US from Indeed.com

Topics

Resources

Stars

Watchers

Forks

Languages