Expand description
This module implements a garbage-collecting cursor that compacts according to a garbage-collection policy.
This module is intended to be used within lsmtk where a KeyValueStore performs the act of rewriting data according to a garbage-collection policy, and a second, separate, process is responsible for unlinking the old files that served as inputs to the garbage collection after verifying that the policy is upheld and no extra data is thrown away.
This is intended to make sure the garbage collection policy is specified in two places. By specifying it twice, we can verify that the garbage collection mechanism doesn’t break from one release to the next. The verifier or key-value store can be updated independently and skew across releases of the code. Thus, if there’s an update to this code, it can be compared against the old code.
Consequently, we need some rules that allow us to garbage collect safely.o
We start with the observation that just because garbage collection can throw something away, doesn’t mean that it will throw something away. This gives rise to three rules:
- When updating to a policy that retains more data, update the writer first. The verifier will allow for the extra rows to be retained.
- When updating to a policy that deletes more data, update the verifier first. The key-value store will retain excess data, but the verifier will allow that.
- When updating policies, there must always be an incremental path that gets followed, or else both policies must be updated together.
Structs§
- Determine which keys in the constructed cursor should be retained. Can only be built by the
collector
method on a policy. - An error when parsing the textual representation of a garbage collection policy.
Enums§
- A GarbageCollectionPolicy specifies which data to retain and which data to compact-away.
Traits§
- Given a stream of sorted keys, indicate whether a key should be retained.