📦 Parsera

Lightweight Python library for scraping websites with LLMs. You can test it on Parsera website.

Why Parsera?

Because it's simple and lightweight, with minimal token use which boosts speed and reduces expenses.

Installation

pip install parsera
playwright install

Documentation

Check out documentation to learn more about other features, like running custom models and playwright scripts.

Basic usage

If you want to use OpenAI, remember to set up OPENAI_API_KEY env variable. You can do this from python with:

import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY_HERE"

Next, you can run a basic version that uses gpt-4o-mini

from parsera import Parsera

url = "https://news.ycombinator.com/"
elements = {
    "Title": "News title",
    "Points": "Number of points",
    "Comments": "Number of comments",
}

scraper = Parsera()
result = scraper.run(url=url, elements=elements)

result variable will contain a json with a list of records:

[
   {
      "Title":"Hacking the largest airline and hotel rewards platform (2023)",
      "Points":"104",
      "Comments":"24"
   },
    ...
]

There is also arun async method available:

result = await scrapper.arun(url=url, elements=elements)

Running with Jupyter Notebook:

Either place this code at the beginning of your notebook:

import nest_asyncio
nest_asyncio.apply()

Or instead of calling run method use async arun.

Running with CLI

Before you run Parsera as command line tool don't forget to put your OPENAI_API_KEY to env variables or .env file

Usage

You can configure elements to parse using JSON string or FILE. Optionally, you can provide FILE to write output and amount of SCROLLS, that you want to do on the page

python -m parsera.main URL {--scheme '{"title":"h1"}' | --file FILENAME} [--scrolls SCROLLS] [--output FILENAME]

Running in Docker

In case of issues with your local environment you can run Parsera with Docker, see documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
docs		docs
examples		examples
parsera		parsera
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📦 Parsera

Why Parsera?

Table of Contents

Installation

Documentation

Basic usage

Running with Jupyter Notebook:

Running with CLI

Usage

Running in Docker

About

Releases 10

Packages

Contributors 4

Languages

License

raznem/parsera

Folders and files

Latest commit

History

Repository files navigation

📦 Parsera

Why Parsera?

Table of Contents

Installation

Documentation

Basic usage

Running with Jupyter Notebook:

Running with CLI

Usage

Running in Docker

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 4

Languages

Packages