Skip to content

A simple, durable Clojure queueing library. While this is believed to be reasonably solid, it is still relatively new.

License

Notifications You must be signed in to change notification settings

puppetlabs/stockpile

stockpile Clojars Project main

A simple, durable Clojure queueing library. While this is believed to be reasonably solid, it is still relatively new, and the API or behavior may change without warning.

Stockpile supports the durable storage and retrieval of data. After storage, stockpile returns an entry that can be used to access the data later, and when no longer needed, the data can be atomically discarded.

Stockpile is explicitly designed to keep minimal state outside the filesystem. After opening a queue, you can call reduce to traverse the existing entries, but stockpile itself does not retain information about the entries. You must preserve any that you might want to access later.

The ordering of any two entries can be compared (by id), but that ordering is not guaranteed to be exact, only some approximation of their relative insertion order.

A metadata string can be provided for each item stored, but the length of that string may be limited by the underlying filesystem. (The string is currently encoded into the queue item's pathname so that it can be retrieved without having to open or read the file itself). The filesystem might also alter the metadata in other ways, for example if it does not preserve case. The path that's ultimately specified to the filesystem by the JVM may be affected by the locale, and on Linux with common filesystems, for example, often produces UTF-8 paths. See the queue store docstring for further information.

Stockpile is intended to work correctly on any filesystem where rename (ATOMIC_MOVE) works correctly, and where calling fsync/fdatasync on a file and its parent directory makes the file durable. In the past (before JDK 9), that did not include OS X.

And there's apparently a controversy about whether to continue to support the undocumented method stockpile uses to sync the parent directory, or to provide some other documented mechanism.

Unless the items being inserted into the queue are large enough for the sequential transfer rate to dominate, the insertion rate is likely to be limited by the maximum "fsync/fdatasync rate" of the underlying filesystem.

The current implementation tracks the queue ids using an AtomicLong, and while that counter could overflow, even at 100,000 stores per second, it should require about 290 million years.

Stockpile's behavior given unexpected files inside its directory is undefined.

Usage

See queue.clj and queue_test.clj for the API documentation and sample usage.

Testing

As expected, "lein test" will run the test suite, but there are some additional tests can only be run if STOCKPILE_TINY_TEST_FS is set to a directory that resides on an otherwise quiet filesystem with less than 10MB of free space, that is not the current filesystem:

STOCKPILE_TINY_TEST_FS=~/tmp/tiny lein test

During the tests stockpile may repeatedly fill that filesystem.

You can also run the tests via dev/run-test, and if you're on Linux and root, it will automatically set up, use, and tear down a suitable tiny loopback filesystem. Alternately, if you have sudo access to root, you can invoke dev/run-test --use-sudo to do the same. Though invoking dev/run-test outside of a "throwaway" virtual machine is not recommended right now.

Implementation notes

The current implementation follows the the traditional POSIX-oriented approach of storing each entry in its own file, with a name that's guaranteed to be unique, and where durability is provided by this sequence of operations:

  • Write the entry data to a temp file,
  • fdatasync() the temp file,
  • rename() the temp file to the final name,
  • and fsync() the parent directory.

Or rather, stockpile calls JVM functions that are believed to provide that behavior or an equivalent result. The parent directory fsync() is required in order to ensure that the destination file name is also durable.

Given this approach, the "fsync rate" of the underlying filesystem will constrain the entry storage rate for a given queue, and so batching multiple items into a single entry (when feasible) may be an effective way to increase performance.

An overview of relevant concepts:

Related JVM methods and documentation:

Supporting POSIX functions:

License

Copyright © 2016 Puppet Labs Inc

Distributed under the Apache License Version 2.0. See ./LICENSE for details.

About

A simple, durable Clojure queueing library. While this is believed to be reasonably solid, it is still relatively new.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published