Skip to content

An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis

Notifications You must be signed in to change notification settings

66654TANNN/segmenta

 
 

Repository files navigation

Segmenta: a segments api using Redis for storage

What does it do?

Provides a mechanism for storing and retrieving sets of numbers quickly as well as performing operations with those sets. Currently supported are:

  • and: produce the set C of numbers which are in both A and B
    • [ 1, 2, 3 ] and [ 2, 3, 4 ] = [ 2, 3 ]
  • or: produce the set C of numbers which are in either A or B
    • [ 1, 2, 3 ] or [ 2, 3, 4 ] = [ 1, 2, 3, 4 ]
  • not: produce the set C of numbers which are in A excluding those in B
    • [ 1, 2, 3 ] not [ 2, 3, 4 ] = [ 1 ]

How to use?

  1. import or require segmenta
    • javascript: const Segmenta = require("segmenta");
    • typescript: import Segmenta from "segmenta";
    • the only export of the library is the Segmenta class
  2. create an instance with options, if required:
    • const segmenta = new Segmenta()
    • const segmenta = new Segmenta(options)
      • options have the structure:
        {
          redisOptions?: RedisOptions,
          segmentsPrefix?: string,
          bucketSize?: number,
          resultsTTL?: number,
        }
        where:
        • RedisOptions are the options which would be passed to ioredis to initialize (eg host, port, etc)
        • segmentsPrefix is a prefix to apply to all segments keys (defaults to "segments")
        • bucketSize is the max size to use when creating buckets (defaults to 50kb)
        • resultsTTL is the time you'd like result snapshots to live for when not explicitly released (defaults to 1 day)
  3. Populating data
    • add adds ids to the segment
      await segmenta.add("my-segment", [ 1, 2, 3 ]);
      // my-segment now contains [ 1, 2, 3 ]
    • del deletes ids from the segment
      await segmenta.del("my-segment", [ 2, 3 ]);
      // my-segment now contains just [ 1 ]
    • put takes a sequence of add / del commands and performs them in order (useful for batching streaming data)
      await segmenta.put("my-segment", [
        { add: 5 },
        { del: 1 },
        { add: 4 },
        { del: 5 },
        { add: 1 }
      ]);
      // my-segment now contains [ 1, 4 ]
  4. Query
    • results are returned as an object with the shape:

      {
        ids: number[],
        skipped: number,
        count: number,
        resultSetId: string, // used to re-query against snapshot
        total: number
      }
    • simple queries are supported (entire segments), however a DSL is provided for easier querying. The client can perform and, or, and not operations on results if they are retreived as buffers (see example below in DSL area).

      1. Simple query, all results returned:
      await segmenta.add("my-new-segment", [ 10, 20, 30 ]);
      const result = await segmenta.query("my-new-segment");
      /* result looks like:
      {
        ids: [ 10, 20, 30 ],
        skipped: 0,
        count: 3,
        resultSetId: "4deee554-da28-4029-8231-98060fa014dc",
        total: 3
      }
      */
      1. Paged query:
      await segmenta.add("paged-results-segment", [ 1, 2, 3, 4, 5 ]);
      const result1 = await segmenta.query({
        query: "paged-results-segment",
        skip: 0,
        take: 2
      });
      /* result1 looks like:
      {
        ids: [ 1, 2 ],
        skipped: 0,
        count: 2,
        resultSetId: "63c6a1f0-8aec-4249-9d80-63c5de13b942",
        total: 5
      }
      */
      // the rest of the results can be obtained with:
      const result2 = await segmenta.query({
        query: "63c6a1f0-8aec-4249-9d80-63c5de13b942",
        skip: 2
      });
      1. Paged results are snapshot and can be re-queried by using their id (uuid). Snapshots automatically expire after 24 hours (or the number of seconds specified by resultsTTL in your constructor arguments. You may manually dispose of results when you no longer need them:
      const result = await segmenta.query({ query: "my-set", skip: 0, take: 10 });
      await segmenta.dispose(result.resultSetId);
      

      Snapshots are only created when queries are performed with a positive integer skip or take value

    • There is a DSL for querying in a more readable manner:

      await segmenta.add("set1", [ 1, 2 ]);
      await segmenta.add("set2", [ 3, 4, 5 ]);
      await segmenta.add("set3", [ 2, 3, 5, 6, 7 ]);
      await segmenta.add("set4", [ 5, 6 ]);
      // ... some time later ...
      const query = "get where in 'set1' or 'set2' and 'set3' not 'set4'";
      const result1 = await segmenta.query(query);
      // or, with paging options:
      const result2 = await segmenta.query({ query, skip: 10, take: 100 });
      
      // the query syntax above is analogous to the following
      //  more manual query mechanism:
      const set1 = await segmenta.getBuffer("set1");
      const set2 = await segmenta.getBuffer("set2");
      const set3 = await segmenta.getBuffer("set3");
      const set4 = await segmenta.getBuffer("set4");
      // these operations are fast, acting on bitfields in memory.
      const final = set1          // [ 1, 2 ]
                      .or(set2)   // [ 1, 2, 3, 4, 5 ]
                      .and(set3)  // [ 2, 3, 5 ]
                      .not(set4)  // [ 2, 3 ]
                      .getOnBitPositions()
                      .values; // returns the numeric array for bit positions
      

      One may also query for counts only:

      await segmenta.query("count where in 'x');

      One may also query for results to come back in a random order:

      await segmenta.query("random where in 'x');

      When paging is requested and a resultSetId is returned, requering against that result-set will maintain the original randomized order.

      Query syntax is quite simple:

      (GET | COUNT | RANDOM) WHERE IN('segment-id')
                  [(AND|OR|NOT) IN('other-segment')]...
                  [MIN {int}]
                  [MAX {int}]
                  [SKIP {int}]
                  [TAKE {int}]
      
      • segments are identified by strings (single- or double-quoted)
      • only two operations are supported: GET and COUNT
      • the results of COUNT look like GET except no segment data is returned. Use the total field in the result to read your count value.
      • boolean operations are run left-to-right
      • operations may be grouped with brackets, in which case they are evaluated first, eg: GET WHERE IN('x') AND NOT (IN('y') OR IN('z'))
        • retreives values which are in 'x' and also not in 'y' or 'z';
      • brackets around segment ids are optional: GET WHERE IN 'x' is equivalent to GET WHERE IN('x')
      • the IN keyword is optional after the first usage: GET WHERE IN 'x' and IN 'y' is equivalent to GET WHERE IN 'x' AND 'y'
      • syntax is case-insensitive GET WHERE IN 'x' is equivalent to get where in 'x' and Get Where In('x')
      • skip and take can also be set on the query options -- when doing so, the skip/take values on query options take precedence. This allows easy re-use of natural-language query with changing paging values, but also facilitates natural language paging if that is your preference.
      • MIN and MAX set minimum and maximum values to bring back in the result set. These values are inclusive. This may be useful if SKIP doesn't suit your chunking needs, but rather setting a MIN and a TAKE
      • min and max can also be set on query options. As with skip and take, the query options values for min and max override any natural language specification
      • segment ids are case-sensitive
        • get where in 'MY-SEGMENT' is NOT equivalent to get where in 'my-segment'
      • segment ids may not contain quotations
        • they must be valid redis keys
      • some queries will never make sense, so expect either strange results or parse errors: GET WHERE NOT IN 'x'
        • since the segments are open-ended, this is essentially an infinite set of all numbers, excluding those in segment 'x'

About

An npm module which exposes a blazingly fast API for manipulating and querying arbitrary data segments in Redis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 96.4%
  • JavaScript 3.6%