Skip to content

NucliaDB, The vector database optimized for documents and video search

License

Notifications You must be signed in to change notification settings

dhmspector/nucliadb

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nucliadb_standalone nucliadb_writer nucliadb_reader nucliadb_ingest nucliadb_node nucliadb_search Contributor Covenant License: AGPL V3 Twitter Follow Discord Rust Python codecov

Nuclia

The AI Search Database.

NucliaDB is a robust database that allows storing and searching on unstructured data.

It is an out of the box hybrid search database, utilizing vector, full text and graph indexes.

NucliaDB is written in Rust and Python. We designed it to index large datasets and provide multi-teanant suport.

When utilizing NucliaDB with Nuclia cloud, you are able to the power of an NLP database without the hassle of data extraction, enrichment and inference. We do all the hard work for you.

Features

  • Store text, files, vectors, labels and annotations
  • Perform text searches and given a word or set of words, return resources in our database that contain them.
  • Perform semantic searches with vectors. For example, given a set of vectors, return the closest matches in our database. With NLP, this allows us to look for similar sentences without being constrained by exact keywords.
  • Export your data in a format compatible with most NLP pipelines (HuggingFace datasets, pytorch, etc)
  • Store original data, extracting and data pulled from the Understanding API
  • Index fields, paragraphs, and semantic sentences on index storage
  • Cloud data and insight extraction with the Nuclia Understanding API™
  • Cloud connection to train ML models with Nuclia Learning API™
  • Role based security system with upstream proxy authentication validation
  • Resources with multiple fields and metadata
  • Text/HTML/Markdown plain fields support
  • Field types: text, file, link, conversation, layout
  • Storage layer support: TiKV, Redis and PostgreSQL
  • Blob support with S3-compatible API, GCS and PG drivers
  • Replication of index storage
  • Distributed search
  • Cloud-native

Architecture

Architecture

Quickstart

Trying NucliaDB is super easy! You can extend your knowledge with the following readings:

💬 Community

🙋 FAQ

How is NucliaDB different from traditional search engines like Elasticsearch or Solr?

The core difference and advantage of NucliaDB is its architecture built from the ground up for unstructured data. Its vector index, keyword, graph and fuzzy search provide an API to use all extracted and extracted information from Nuclia, Understanding API and provides powerful NLP abilities to any application with low code and peace of mind.

What license does NucliaDB use?

NucliaDB is open-source under the GNU Affero General Public License Version 3 - AGPLv3. Fundamentally, this means that you are free to use NucliaDB for your project, as long as you don't modify NucliaDB. If you do, you have to make the modifications public.

What is Nuclia's business model?

Our business model relies on our normalization API, this one is based on Nuclia Learning API and Nuclia Understanding API. This two APIs offers transformation of unstructured data to NucliaDB compatible data with AI. We also offer NucliaDB as a service at our multi-cloud provider infrastructure: https://nuclia.cloud.

🤝 Contribute and spread the word

We are always happy to have contributions: code, documentation, issues, feedback, or even saying hello on discord! Here is how you can get started:

✨ And to thank you for your contributions, claim your swag by emailing us at info at nuclia.com.

Reference

Meta

About

NucliaDB, The vector database optimized for documents and video search

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.1%
  • Rust 21.5%
  • PureBasic 6.9%
  • Makefile 0.4%
  • HTML 0.1%
  • Dockerfile 0.0%