NucliaDB is a robust database that allows storing and searching on unstructured data.
It is an out of the box hybrid search database, utilizing vector, full text and graph indexes.
NucliaDB is written in Rust and Python. We designed it to index large datasets and provide multi-teanant support.
When utilizing NucliaDB with Nuclia cloud, you are able to the power of an NLP database without the hassle of data extraction, enrichment and inference. We do all the hard work for you.
- Store text, files, vectors, labels and annotations
- Perform text searches and given a word or set of words, return resources in our database that contain them.
- Perform semantic searches with vectors. For example, given a set of vectors, return the closest matches in our database. With NLP, this allows us to look for similar sentences without being constrained by exact keywords.
- Export your data in a format compatible with most NLP pipelines (HuggingFace datasets, pytorch, etc)
- Store original data, extracting and data pulled from the Understanding API
- Index fields, paragraphs, and semantic sentences on index storage
- Cloud data and insight extraction with the Nuclia Understanding API™
- Cloud connection to train ML models with Nuclia Learning API™
- Role based security system with upstream proxy authentication validation
- Resources with multiple fields and metadata
- Text/HTML/Markdown plain fields support
- Field types: text, file, link, conversation
- Storage layer (PostgreSQL)
- Blob support with S3-compatible API, GCS and Azure Blob Storage
- Replication of index storage
- Distributed search
- Cloud-native
Trying NucliaDB is super easy! You can extend your knowledge with the following readings:
- Quick start!
- Read about what Knowledge boxes are in our basic concepts section
- Upload your data
- Chat with us in Slack
- 📝 Blog Posts
- Follow us on X
- Do you want to work with us?
The core difference and advantage of NucliaDB is its architecture built from the ground up for unstructured data. Its vector index, keyword, graph and fuzzy search provide an API to use all extracted and extracted information from Nuclia, Understanding API and provides powerful NLP abilities to any application with low code and peace of mind.
NucliaDB is open-source under the GNU Affero General Public License Version 3 - AGPLv3. Fundamentally, this means that you are free to use NucliaDB for your project, as long as you don't modify NucliaDB. If you do, you have to make the modifications public.
Our business model relies on our normalization API, this one is based on Nuclia Learning API
and Nuclia Understanding API
. This two APIs offers transformation of unstructured data to NucliaDB compatible data with AI. We also offer NucliaDB as a service at our multi-cloud provider infrastructure: https://nuclia.cloud.
We are always happy to have contributions: code, documentation, issues, feedback, or even saying hello on Slack! Here is how you can get started:
- Read our Contributor Covenant Code of Conduct
- Create a fork of NucliaDB and submit your pull request!
✨ And to thank you for your contributions, claim your swag by emailing us at info at nuclia.com.