`zip-to-parquet`

A really simple command line utility. Takes a .zip file / files as input. The output is a .parquet file with one row per compressed file found inside the .zip file(s). The parquet file has the following columns:

Column Name	Column Type	Description
`name`	`varchar`	The full name of the file
`source`	`varchar`	The path to the original zip file
`body`	`blob`	A binary blob of the contents of the file

Uses 1024MB blocks, and Snappy compression.

This is a utility for some domain-specific data parsing involving very high numbers of files that are initially stored in zips. It's faster to incorporate them into data pipelines by converting them to parquet files, instead of unzipping to disc.

Examples

Get help on all options:

  zip-to-parquet --help

Convert a zip to a parquet:

zip-to-parquet -i ~/downloads/my_cool_zip.zip -i ~/downloads/my_other_cool_zip.zip -o ~/my_new_parquet.parquet

Convert all zips in /data/lots_of_zips/ and /data/other_zips/ to a parquet, only including .png files:

  zip-to-parquet -i "/data/lots_of_zips/**/*.zip" -i "/data/other_zips/**/*.zip" -o ~/my_new_parquet.parquet -g "**/*.png"

Be careful with globs as arguments, as some shells will automatically expand paths with asterixes in them if not wrapped in quotes.

Put only the names of files in a zip file into a parquet:

  zip-to-parquet -i my_cool_zip.zip -o my_new_parquet.parquet --no-body --no-source

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
src		src
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
devenv.lock		devenv.lock
devenv.nix		devenv.nix
devenv.yaml		devenv.yaml
flake.nix		flake.nix
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`zip-to-parquet`

Examples

About

Releases 16

Packages

Languages

mcpar-land/zip-to-parquet

Folders and files

Latest commit

History

Repository files navigation

zip-to-parquet

Examples

About

Resources

Stars

Watchers

Forks

Releases 16

Packages 0

Languages

`zip-to-parquet`

Packages