#memory #char #memory-size #character

packed-char

Stores a char or a 22-bit integer in 32 bits

3 releases

0.1.2 Sep 5, 2024
0.1.1 Mar 23, 2024
0.1.0 Mar 19, 2024

#204 in Compression

33 downloads per month
Used in 2 crates

MIT license

12KB
168 lines

docs.rs Crates.io Version GitHub License

packed-char

Allows either a char or a 22-bit integer to be stored in 32 bits, the same size as a char.

How it works

packed-char takes advantage of the valid ranges for a char to determine what type of data is stored. These ranges are 0..0xD800 and 0xDFFF..0x10FFFF (see the documentation for char). The range 0xD800..=0xDFFF contains surrogate code points, which are not valid UTF-8 characters. chars are stored unmodified. To store a u22 without overlapping valid char ranges, it is first split it into two 11-bit chunks. The left chunk is stored in the leading bits, which chars never overlap with. The right chunk is stored in the trailing bits, which do overlap the bits used by chars. To make this work, take note of the bit pattern in the surrogate range:

1101100000000000 // Start
1101111111111111 // End
^^^^^

The leading 5 bits are constant in this range. Referred to here as the surrogate mask, they serve as a signature for u22 values. They are set along with the left and right 11-bit chunks:

11111111111  00000    11011            11111111111
left chunk | unused | surrogate mask | right chunk

Now we have two cases:

  • The left chunk is zero and the value is in the surrogate range.
  • The left chunk is nonzero and the value exceeds char::MAX.

Thus, char and u22 values are disambiguated.

No runtime deps