Skip to content
forked from jina-ai/serve

Cloud-native neural search framework for ๐™–๐™ฃ๐™ฎ kind of data

License

Notifications You must be signed in to change notification settings

shankdaring/jina

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Jina logo: Jina is a cloud-native neural search framework

Cloud-Native Neural Search[?] Framework for Any Kind of Data

Python 3.7 3.8 3.9 PyPI Docker Image Version (latest semver) codecov

Jina๐Ÿ”Š allows you to build deep learning-powered search-as-a-service in just minutes.

๐ŸŒŒ All data type - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.

๐ŸŒฉ๏ธ Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy containerizing, streaming, paralleling, sharding, async scheduling, HTTP/gRPC/WebSocket protocol.

โฑ๏ธ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

๐Ÿฑ Own your stack - Keep an end-to-end stack ownership of your solution, avoid integration pitfalls with fragmented, multi-vendor, generic legacy tools.

Run Quick Demo

Install

  • via PyPI
    $ pip install "jina[devel]"          
    $ jina -v
    2.0.0
  • via Docker
    $ docker run jinaai/jina:latest -v
    2.0.0
๐Ÿ“ฆ More installation options

x86/64,arm64,v6,v7,Apple M1
On Linux/macOS & Python 3.7/3.8/3.9 Docker Users
Standard pip install jina docker run jinaai/jina:latest
Daemon pip install "jina[daemon]" docker run --network=host jinaai/jina:latest-daemon
With Extras pip install "jina[devel]" docker run jinaai/jina:latest-devel

Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.

Get Started

Document, Executor, and Flow are the three fundamental concepts in Jina.

1๏ธโƒฃ Copy-paste the minimum example below and run it:

๐Ÿ’ก Preliminaries: character embedding, pooling, Euclidean distance

The architecture of a simple neural search system powered by Jina

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset   1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embeddings from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embeddings from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, scores={'euclid': d}) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.scores['euclid'].value)  # sort matches by their values

f = Flow(port_expose=12345, protocol='http').add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of this file
    f.block()  # block for listening request

2๏ธโƒฃ Open http://localhost:12345/docs (an extended Swagger UI) in your browser, click /search tab and input

{"data": [{"text": "@requests(on=something)"}]}

Here @requests(on=something) is our textual query, we want to find the lines most similar to request(on=something) from the above server code snippet. Now click Execute button!

Jina Swagger UI extension on visualizing neural search results

๐Ÿ†™ Not a GUI guy? Let's query it from Python then! Keep the above running and start a simple client:

from jina import Client, Document
from jina.types.request import Response


def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclid"].value:2f}: "{d.text}"')


c = Client(protocol='http', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

, which prints the following results:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

๐Ÿ˜” Doesn't work? Our bad! Please report it here.

Read Tutorials

Support

Join Us

Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

About

Cloud-native neural search framework for ๐™–๐™ฃ๐™ฎ kind of data

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.4%
  • HTML 1.2%
  • Shell 0.8%
  • CSS 0.6%
  • Dockerfile 0.6%
  • EJS 0.2%
  • JavaScript 0.2%