Skip to content

Prototype for an umbrella search over a number of Publisher 7 apps

Notifications You must be signed in to change notification settings

eeditiones/cross-search

Repository files navigation

cross-search

Prototype for an umbrella search over a number of Publisher apps

Apps used for the prototype:

Preliminary assumptions

  • Publisher 7 based apps

  • comparable data sets; data can be encoded with any schema, but it should have the same (or at least mappable) fields and facets set defined and use the same taxonomies

  • umbrella search endpoints, accepting the same set of parameters but differing in what units are returned:

    • document search: results are whole documents; corresponding to Publisher's metadata search
    • fragment search: results are document fragments: divisions/sections; corresponding to Publisher's full text search

NB: we run out of time for the fragment search in the current iteration

Search

Individual

Each of the apps to be aggregated in an umbrella search exposes the /api/search/document API endpoint:

It allows for a number of parameters, for the prototype assuming title, author, lang(uage) fields and genre, language and corpus facets. Each parameter corresponds to a field or facet definiton in the context of the app. Configuration of the mapping between api/search/document parameters and fields and facets of the app is realized via config.xqm: $config:cross-search-facets and $config:cross-search-fields.

Each field parameter has a corresponding -operator parameter representing a logical conjunction (AND / OR) to be used when querying for multiple base parameters (e.g. author and author-operator).

Parameters for use in facetted search follow the facet- naming pattern.

Additional sort parameter determines the order of sorting.

Configuration

declare variable $config:cross-search-facets := 
    map {
        "genre": "genre", 
        "language": "language-id",
        "corpus": "corpus"      
    };
declare variable $config:cross-search-fields := 
    map {
        "lang": "language-id", 
        "author":"author", 
        "title":"title"
    }; 

Query example

For a simple example, a request could specify parameters as follows to express a search for a document with a title containing the phrase 'new' and authored by someone with a name starting with B, containing the word 'street' anywhere in the document content:

  • query: street
  • author: b*,
  • title: new
  • sort: author

It is assumed that individual apps implement the endpoints in a way, which returns matching results (representing documents) in a structure required for the aggregated search, namely:

  • in a JSON format containing two parts
    • data array containing matched results as maps with
      • app collection name
      • document path relative to the app data root colection
      • values for all the fields available for sorting
    • facets containing facet counts corresponding with the matched results
{
  "data": [
    {
      "app": "dodis-facets",
      "filename": "52928.xml",
      "author": [
        "Plattner, Johann (1932–)",
        "Austria/Ministry of Foreign Affairs"
      ],
      "title": "Debatte über die deutsche Wiedervereinigung; Information und Sprachregelung",
      "language": "de"
    },
    ...
    ],
  "facets": {
    "genre": {
      "Telegram": 1,
      "Memo": 4
    },
    "corpus": {
      "Dodis": 5
    },
    "language": {
      "de": 5
    }
  }
}

Future, extended implementation might include also:

  • document fragment id (either xml:id or node-id) [optional]
  • full-text match ids [optional]
  • ODD-transformed document header [optional]
  • ODD-transformed full-text content matched [optional]

Cross-search app

Umbrella app exposes the same API endpoint, but it only triggers the search through individual apps via http request and passes on the results.

Alternative approach would be that umbrella app actually runs the search through all relevant data collections at once but this would require that all apps reside in a single eXist instance and ideally share the fields and facets definitions (a mapping could be established to run tailored queries if not).

Configuration

Individual apps to be aggregated need to be explicitly specified in modules/config.xqm as maps with following parameters:

  • app - collection in which an app is installed
  • title - title to be displayed on cross-search website
  • icon - image to be used for tiles in the cross-search app listing
  • symbol - icon to be used to visually distinguish documents from different corpora
  • server - address of an exist instance on which the app is available
(: Configuration for cross-search :)
declare variable $config:sub := (
    map {
        "app": "eltec", 
        "title": "ELTeC: European Literary Corpus",
        "icon": "eltec-logo.jpeg",
        "symbol": "icons:bookmark",
        "server": "http://localhost:8080/exist/apps"
    },
    map { 
        "app": "dodis-facets",
        "title": "Dodis: When the Wall Came Down",
        "icon": "trabi.jpg",
        "symbol": "icons:drafts",
        "server": "http://localhost:8080/exist/apps"
    },
    map { 
        "app": "serafin",
        "title": "Correspondence of Mikołaj Serafin",
        "icon": "serafin.png",
        "symbol": "icons:mail",
        "server": "http://localhost:8080/exist/apps"
    });

About

Prototype for an umbrella search over a number of Publisher 7 apps

Resources

Stars

Watchers

Forks

Packages

No packages published