A base32 decoder/encoder using Crockford's alphabet
krock32 is a Python implementation of Crockford's base32 alphabet, including checksumming. It is created to allow encoding of arbitrary data to a human readable format, allowing for human-to-human transmission, while also being machine-decodable. All this without being case-sensitive or using special characters like base64 does.
Base32 is the encoding of arbitrary data into a set of 32 (human-readable) symbols, which make up the encoding's alphabet. In most implementations, this allows any data to be represented entirely as ASCII characters, which is useful for certain forms of transmission. In difference from the more common base64, which uses 64 symbols, base32 can be case-insensitive and exclude special, non-alphanumeric characters.
Encoding data to base32 will, necessarily, expand the data. A byte of data can encode 256 values (8 bits), whereas a symbol of base32 can only encode 32 values (5 bits). This means the base32-encoded data will be 60% larger (8/5) than the base data. The shortest example is that 5 bytes (40 bits) will encode to 8 base32 symbols.
Crockford's base32 alphabet differs from the RFC 4648 standard in 5 ways:
- It starts with the numerals, so that e.g.
0b00001
encodes to1
. - It does not use padding, so encodings of data not divisible by 5 bytes will not be padded with
=
. Instead, the missing bits are treated as zeros and a last symbol is selected from that. - It explicitly defines that both upper and lowercase characters are valid to the decoder, so that e.g. both
G
andg
decode to0b10000
. - It eliminates commonly mis-read characters
I
,L
,O
, andU
from the encoding. Instead,O
,o
, and0
all decode to0b00000
and1
,I
,i
,L
, andl
all decode to0b00001
.U
andu
are only used as checksums. - It optionally allows a checksum to be appended to the very end of the encoded data.
You should be able to install krock32 from PyPi:
pip install krock32
I have only ever tested it for Python 3.8, but I see no reason it shouldn't work for earlier 3.x versions. 2.7 is not supported, but may work, what do I know.
There's a test suite in tests/test_krock32.py
which requires pytest
and hypothesis
to run. Given those prerequisites, testing should be as easy as
cd tests
pytest
krock32 is relatively easy to use. There are two objects - Encoder
and Decoder
- that have two methods - update()
and finalize()
.
The encoder takes bytes-like data (bytes
or bytearray
) and encodes into Crockford base32.
import krock32
encoder = krock32.Encoder(checksum=True)
# Instantiates a krock32 encoder, here with the optional checksum enabled.
# If checksum is not given, it is False by default.
encoder.update(b'this is some bytes data')
# Calling update feeds data into the encoder, updating its internal state.
encoder.update(b'this is more, fun data')
# Consecutive updates are allowed; this just appends the new data
encoding = encoder.finalize()
# Calling finalize returns the encoded string from the encoder. It also
# disables the encoder, so to encode more data you must instantiate a
# new encoder.
# In the code above, encoding is:
# EHM6JWS0D5SJ0WVFDNJJ0RKSEHJQ6834C5T62X38D5SJ0TBK41PPYWK55GG6CXBE41J62X31W
# where the final 'W' is the checksum.
The inverse of the encoder, the decoder takes a Crockford base32 string and decodes it into bytes
.
import krock32
decoder = krock32.Decoder(strict=False, checksum=True)
# Instantiates a krock32 decoder, here with the optional checksum enabled
# and strictness - allowing only the 32 symbols generated by the encoder -
# disabled. Note that if a string is encoded with a checksum, the decoder
# must also have checksumming enabled, and vice versa.
# If checksum is not given, it is False by default.
# Note: There is a third option: ignore_non_alphabet. It is currently
# not implemented, but will allow filtering out all symbols not in the
# alphabet, such as whitespace and unicode garbage.
decoder.update('EHM6JWS0D5SJ0WVFDNJJ0RKSEHJQ6834')
# Calling update feeds data into the decoder, updating its internal state.
decoder.update('c5t62x38d5sj0tbk41ppywk55gg6cxbe41j62x31w')
# Consecutive updates are allowed; this just appends the new data
# Even though the input string's length must be a multiple of 8 symbols plus
# 0, 2, 4, 5, or 7 (1, 3, 5, or 6 with checksumming), each call to update can
# be of any length.
decoding = decoder.finalize()
# Calling finalize returns the decoded bytes from the decoder. It also
# disables the decoder, so to decode more data you must instantiate a
# new decoder.
# In the code above, decoding is:
# b'this is some bytes datathis is more, fun data'
krock32 is licensed under the MIT license.
krock32 implements Douglas Crockford's (@douglascrockford) base32 alphabet and checksumming scheme with his permission.
-
0.1.1 Optimization patch. Ran the encoder and decoder through a profiler with 1MiB of random input (1.7MiB to the decoder; the encoded version of the encoder's input). I was well aware of both of them being inefficient and found most of it had to do with throwing around strings.
The results for my (admittedly short) tests:
- Encoder: 31,560s (~32,4 kiB/s) pre-optimization; 2,737s (~374 kiB/s) post-optimization. That's an order of magnitude faster.
- Decoder: 9,342s (~175,4 kiB/s) pre-optimization; 2,817s (~581,6 kiB/s) post-optimization. A little more than 3 times faster.
-
0.1.0 Initial release.
-
0.0.1 Development version. This project started out named
zim
and implemented z-base-32. This was later discarded when I learned that Crockford's alphabet does a better job with interchangeable symbols, which I explicitly wanted. Package changed its name tokrock32
.