ziggy

The documentation is pretty sparse right now, but most of the action is in llm.py, index.py, and database.py. Note that there are custom torch kernels for quantization, dequantization, and quantized matmul, so you'll need to compile and install those first for vector index capabilities.

Install

For simple usage, no additional steps are needed beyond cloning the repository. If you want quantization support (optional), you need to build the custom torch extensions, in the extension directory:

cd extension
python setup.py install --user

For CUDA builds, the environment variable CUDA_HOME must be set to your CUDA directory. To build in CPU-only mode, set the environment variable ZIGGY_CPU_ONLY=1.

If you need to use an older C compiler with nvcc, set CUDAHOSTCXX to the directory containing the c binary.

To build standalone testing binaries (very optional), get libtorch for CUDA, assign its absolute path to LIBTORCH_PATH, and run:

mkdir build
cmake -B build -DCMAKE_PREFIX_PATH=$LIBTORCH_PATH
cmake --build build --config Release

Usage

To load up a huggingface embedding model just run

model_id = 'TaylorAI/bge-micro-v2'
mod = ziggy.HuggingfaceEmbedding(model_id)

By default it does not optimization and uses the maximum token window length. If you want to say use ONNX optimization and set a 256 length token window

mod = ziggy.HuggingfaceEmbedding(model_id, onnx=True, max_len=256)

You can also use llama.cpp via llama-cpp-python with

mod = ziggy.LlamaCppPython(path_to_gguf)

Then we can embed text with the mod.embed function with either a string or a list of strings. To load in big corpora, I first convert them to a single JSONL file to load in. To load the file path into an index

db = ziggy.DocumentDatabase.from_jsonl(path, embed=mod, delim='\n', min_len=100)

Here we set the within record delimiter to a single newline and discard any chunks with less than 100 characters. This is a pretty good approach for Wikipedia. If you want to quantize the stored vectors, you can do something like the following

qspec = ziggy.QuantSpec.from_width(ziggy.QuantType.qint4, 0.0, 0.1)
db = ziggy.DocumentDatabase.from_jsonl(path, embed=mod, qspec=qspec, delim='\n', min_len=100)

Now you can do top-k semantic search on the database with something like

db.search('Anarchism', 5)

By default the indexing operation will have generated average embeddings at the document level and these are used as a first pass when searching.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
extension		extension
scripts		scripts
testing		testing
ziggy		ziggy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
zigcmd		zigcmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ziggy

Install

Usage

About

Releases

Packages

Languages

License

CompendiumLabs/ziggy

Folders and files

Latest commit

History

Repository files navigation

ziggy

Install

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages