GitHub - jrosell/1br: 1 Billion Row challenge with R

1br

Introduction

1 Billion Row challenge with R:

This is the repo inspired by Gunnar Morlng's 1 billion row challenge to see which functions / libraries are quickest in summarizing the mean, min and max of a 1 billion rows of record
This work is based on alejandrohagan/1br and #5, but I've only used 1e8 rows.
I added some duckdb options and polars scan option. In order to do it I've added a file copy and file reading steps in each benchmark method to be sure to compare the pipelines without caching and a maximum of 8 threads.
If you see any issues or have suggestions of improvements, please let me know.

Instructions

If you need, execute install_required_packages(install = TRUE) from install.R file.
Generate 1e5, 1e6, 1e7, 1e8, 1e9 data running: ./generate_data.sh
Run the benchmark running: Rscript run.R or Rscript run.1e9.R
Check the generated plots and the results.

Results

2024-05-16

2024-03-27

2024-02-29

readr::read_rds("2024-02-29_all.rds")  %>% 
  group_split(n)

What can I do?

If you want, you have time and you have enough memory available in your computer, then you can try get the results for 1e9 rows:

Uncomment 1e9 lines on ./generate_data.sh
Comment run.R:25 and uncomment run.R:26
Generate 1e6, 1e7, 1e8 and 1e9 data running: ./generate_data.sh
Run the benchmark running: Rscript run.R
Check the generated plots.
Compare with other languages and solutions (Look at compare.php or onebrc for for rust)

Feedback is welcome. You can open an issue in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
onebrc		onebrc
.env		.env
.gitignore		.gitignore
.rigrc		.rigrc
1br.Rproj		1br.Rproj
2024-02-21_1e6_rows.png		2024-02-21_1e6_rows.png
2024-02-21_1e7_rows.png		2024-02-21_1e7_rows.png
2024-02-21_1e8_rows.png		2024-02-21_1e8_rows.png
2024-02-21_all.rds		2024-02-21_all.rds
2024-02-21_all_rows.png		2024-02-21_all_rows.png
2024-02-22_1e6_rows.png		2024-02-22_1e6_rows.png
2024-02-22_1e7_rows.png		2024-02-22_1e7_rows.png
2024-02-22_1e8_rows.png		2024-02-22_1e8_rows.png
2024-02-22_all.rds		2024-02-22_all.rds
2024-02-22_all_rows.png		2024-02-22_all_rows.png
2024-02-28_1e6_rows.png		2024-02-28_1e6_rows.png
2024-02-28_1e7_rows.png		2024-02-28_1e7_rows.png
2024-02-28_1e8_rows.png		2024-02-28_1e8_rows.png
2024-02-28_all.rds		2024-02-28_all.rds
2024-02-28_all_rows.png		2024-02-28_all_rows.png
2024-02-29_1e6_rows.png		2024-02-29_1e6_rows.png
2024-02-29_1e7_rows.png		2024-02-29_1e7_rows.png
2024-02-29_1e8_rows.png		2024-02-29_1e8_rows.png
2024-02-29_all.rds		2024-02-29_all.rds
2024-02-29_all_rows.png		2024-02-29_all_rows.png
2024-03-27_1e9_rows.png		2024-03-27_1e9_rows.png
2024-03-27_all.rds		2024-03-27_all.rds
2024-04-11_1e6_duckplyr_rows.png		2024-04-11_1e6_duckplyr_rows.png
2024-04-11_1e7_duckplyr_rows.png		2024-04-11_1e7_duckplyr_rows.png
2024-04-11_1e8_duckplyr_rows.png		2024-04-11_1e8_duckplyr_rows.png
2024-04-11_1e9_duckplyr_rows.png		2024-04-11_1e9_duckplyr_rows.png
2024-04-11_1e9_rows.png		2024-04-11_1e9_rows.png
2024-05-16_1e9_rows.png		2024-05-16_1e9_rows.png
LICENSE		LICENSE
README.md		README.md
base.R		base.R
generate_data.R		generate_data.R
generate_data.sh		generate_data.sh
install-linux.R		install-linux.R
parsing.R		parsing.R
run.1e9.R		run.1e9.R
run.R		run.R
run.duckplyr.R		run.duckplyr.R
run.php		run.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1br

Introduction

Instructions

Results

2024-05-16

2024-03-27

2024-02-29

What can I do?

About

Releases

Packages

Languages

License

jrosell/1br

Folders and files

Latest commit

History

Repository files navigation

1br

Introduction

Instructions

Results

2024-05-16

2024-03-27

2024-02-29

What can I do?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages