Skip to content

logo

Blazingly Fast DataFrame Library

Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.

Key features

  • Fast: Written from scratch in Rust, designed close to the machine and without external dependencies.
  • I/O: First class support for all common data storage layers: local, cloud storage & databases.
  • Intuitive API: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
  • Out of Core: The streaming API allows you to process your results without requiring all your data to be in memory at the same time.
  • Parallel: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
  • Vectorized Query Engine: Using Apache Arrow, a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
  • GPU Support: Optionally run queries on NVIDIA GPUs for maximum performance for in-memory workloads.

Users new to DataFrames

A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.

Philosophy

The goal of Polars is to provide a lightning fast DataFrame library that:

  • Utilizes all available cores on your machine.
  • Optimizes queries to reduce unneeded work/memory allocations.
  • Handles datasets much larger than your available RAM.
  • A consistent and predictable API.
  • Adheres to a strict schema (data-types should be known before running the query).

Polars is written in Rust which gives it C/C performance and allows it to fully control performance-critical parts in a query engine.

Example

scan_csv · filter · group_by · collect

import polars as pl

q = (
    pl.scan_csv("docs/assets/data/iris.csv")
    .filter(pl.col("sepal_length") > 5)
    .group_by("species")
    .agg(pl.all().sum())
)

df = q.collect()

LazyCsvReader · filter · group_by · collect · Available on feature csv · Available on feature streaming

use polars::prelude::*;

let q = LazyCsvReader::new("docs/assets/data/iris.csv")
    .with_has_header(true)
    .finish()?
    .filter(col("sepal_length").gt(lit(5)))
    .group_by(vec![col("species")])
    .agg([col("*").sum()]);

let df = q.collect()?;

A more extensive introduction can be found in the next chapter.

Community

Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:

ritchie46 stinodego alexander-beedie MarcoGorelli orlp reswqa nameexhaustion zundertj coastalwhite ghuls universalmind303 mcrumiller c-peters itamarst wence- matteosantama Dandandan magarick henryharbeck deanm0000 ibENPC cmdlineluser moritzwilksch eitsupi jorgecarleitao mickvangelderen ion-elgreco petrosbar jonashaag r-brink borchero braaannigan marcvanheerden ryanrussell cnpryer Julian-J-S josh cjermain flisky messense illumination-k thatlittleboy marioloko jakob-keller ruihe774 Wainberg mhconradt rben01 sorhawell

Contributing

We appreciate all contributions, from reporting bugs to implementing new features. Read our contributing guide to learn more.

License

This project is licensed under the terms of the MIT license.