Similarity index for Clojure, built on top of hnsw.
Creates a index with the given space type (:L2
or :cosine
), dimension and
maximum number of items. Index can be configured with the following (optional)
options (see this
for more info):
M
- the number of bi-directional links created for every new element during index construction (default32
)ef-construct
- the size of the dynamic list for the nearest neighbors during index construction (default200
)seed
- random seed number (default:777
)
Loads the index from the given index-path
(must be an instance of java.io.File
).
It's possible to increase the maximum of index items during the load.
Saves the index to the given index-path
(must be an instance of java.io.File
).
Returns the dimension of the given index.
Adds new vector to the given index. Vector must be a sequential of floats with size matching to the dimension of the index. Vector must also have an integer id associated to it.
Performs a kNN search to the given index with the given vector x
.
Changes the query ef
value for the index (see this
for an explanation).
Here is a simple RELP-ready demo containing all of the basic API functions:
(ns demo
(:require [clojure.java.io :as io]
[simix.core :as sim]))
; create new L2 index of dim(5) and max 100 items
(def idx (sim/create :L2 5 100))
; add some data
(sim/add! idx {:id 1 :val [0.1 0.3 0.8 0.3 0.6]})
(sim/add! idx {:id 2 :val [0.4 0.6 0.4 0.4 0.4]})
(sim/add! idx {:id 3 :val [0.9 0.4 0.7 0.8 0.2]})
; kNN query, where k = 2
(sim/q idx [0.8 0.3 0.6 0.4 0.1] 2)
; => [{:id 3, :distance 0.19999999}
; {:id 2, :distance 0.38000005}]
; save index to HD
(sim/save! idx (io/file "my_idx"))
; load the index and increase the maximum number
; of items to 200
(def loaded (sim/load (io/file "my_idx") 200)
; kNN query returns same results as the original index
(sim/q loaded [0.8 0.3 0.6 0.4 0.1] 2)
; => [{:id 3, :distance 0.19999999}
; {:id 2, :distance 0.38000005}]
Every index is closed (and memory released) when JVM GC sweeps the stale reference
to the index. However, if you want to explicitly close the index and release its
resources before Java GC, you can do it with (.close index)
.
Index implements Java's Closeable
so it can be used in with-open
macro as well:
(with-open [idx (sim/load "my_idx" 200)]
; use the loaded index ...
)
MIT