Blazingly Fast DataFrame Library
Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.
Key features
- Fast: Written from scratch in Rust, designed close to the machine and without external dependencies.
- I/O: First class support for all common data storage layers: local, cloud storage & databases.
- Intuitive API: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
- Out of Core: The streaming API allows you to process your results without requiring all your data to be in memory at the same time.
- Parallel: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- Vectorized Query Engine: Using Apache Arrow, a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
- GPU Support: Optionally run queries on NVIDIA GPUs for maximum performance for in-memory workloads.
Users new to DataFrames
A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
Philosophy
The goal of Polars is to provide a lightning fast DataFrame library that:
- Utilizes all available cores on your machine.
- Optimizes queries to reduce unneeded work/memory allocations.
- Handles datasets much larger than your available RAM.
- A consistent and predictable API.
- Adheres to a strict schema (data-types should be known before running the query).
Polars is written in Rust which gives it C/C performance and allows it to fully control performance-critical parts in a query engine.
Example
scan_csv
· filter
· group_by
· collect
import polars as pl
q = (
pl.scan_csv("docs/assets/data/iris.csv")
.filter(pl.col("sepal_length") > 5)
.group_by("species")
.agg(pl.all().sum())
)
df = q.collect()
LazyCsvReader
· filter
· group_by
· collect
· Available on feature csv · Available on feature streaming
use polars::prelude::*;
let q = LazyCsvReader::new("docs/assets/data/iris.csv")
.with_has_header(true)
.finish()?
.filter(col("sepal_length").gt(lit(5)))
.group_by(vec![col("species")])
.agg([col("*").sum()]);
let df = q.collect()?;
A more extensive introduction can be found in the next chapter.
Community
Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
Contributing
We appreciate all contributions, from reporting bugs to implementing new features. Read our contributing guide to learn more.
License
This project is licensed under the terms of the MIT license.