#scheme #integer #coding #length #variables #byte

prefix_uvarint

Prefix based variable length integer coding

9 releases (5 breaking)

0.6.1 Mar 30, 2023
0.6.0 Mar 13, 2023
0.5.1 Feb 25, 2023
0.4.1 Feb 15, 2023
0.1.0 Nov 1, 2022

#379 in Compression

Download history 12/week @ 2024-07-29 1/week @ 2024-08-19 23/week @ 2024-09-23

607 downloads per month

MIT/Apache

39KB
826 lines

prefix_uvarint

Crates.io

This module implements a prefix-based variable length integer coding scheme.

Unlike an LEB128-style encoding scheme, this encoding uses a unary prefix code in the first byte of the value to indicate how many subsequent bytes need to be read followed by the big endian encoding of any remaining bytes. This improves coding speed compared to LEB128 by reducing the number of branches evaluated to code longer values, and allows those branches to be different to improve branch mis-prediction.

The PrefixVarInt trait is implemented for u64, u32, u16, i64, i32, and i16, with values closer to zero producing small output. Signed values are written using a Zigzag coding to ensure that small negative numbers produce small output.

PrefixVarInt includes methods to code values directly to/from byte slices, but traits are provided to extend bytes::{Buf,BufMut}, and to handle these values in std::io::{Write,Read}.

Performance

To get a sense of how fast this is on your host, run cargo bench.

Benchmarks run with two value distributions: uniform by encoded length, and zipf by encoded length. On an M1 MacBookAir coding speeds average 1.2G elem/s for values that encode to a single byte, dropping off to around 400M elem/s for the longest (9 byte) values. The encoded length of a value (PrefixVarInt::prefix_varint_len()) averages 3G elem/s.

Dependencies

~175KB