Note: GEM's main functionality has been added to dejavu project - https://github.com/appbaseio/dejavu, which is actively maintained. We recommend using that.
GEM π
- GEM: Intro
- Features
- Mapping and GEM FAQs
- GEM Usage Examples
- Build Locally
- Get GEM
a. Hosted
b. Chrome Extension
c. Elasticsearch Plugin - Other Apps
GEM is a GUI for creating and managing an Elasticsearch index's datastructure mappings. ES Mappings provide an immutable interface to control how data is stored internally within Elasticsearch and how queries can be applied to it.
Mappings allow deciding things like:
- Should a field with value '2016-12-01' be treated as a
date
or as atext
field? - Should 'San Francisco' be stored as an analyzed text field to then run full-text search queries against it, or should it be kept non-analyzed for an aggregations use-case?
- Should 'loc': ['40.73', '-73.9'] be stored as
Object
or should it have ageopoint
datatype.
GEM takes this a step further by providing an on-the-fly mapping inference based on user provided input data.
GEM supports three key mapping related options today:
-
Create data mappings with an on-the-fly auto inferencing capability.
-
Managing all the current data mappings with an option to see the raw JSON data.
-
Importing new data analyzers to be later associated with field mappings.
GEM keeps the entire app state in the URL which makes for easy sharing of views. And most importantly, GEM is entirely built on the client side and is available as a github hosted app.
A mapping in Elasticsearch is like a schema in SQL. It's an API for defining how data should be internally stored within Elasticsearch indexes.
A mapping can be created at the time of an Elasticsearch index creation or afterwards in an explicit definition. If no mapping is specified, it is dynamically created when data is inserted into the index. See an example here.
string
(starting v5.0 is called text
), date
, long
, integer
, short
, byte
, double
, float
, boolean
are the common data types. nested
, object
, binary
, geo_point
, geo_shape
, ip
, completion
are some of the specialized data types. You can read more about the available types on Elasticsearch docs here.
While mapping's main role is in defining data structures, it also allows defining certain indexing and querying related parameters that are commonly used. For example, analyzer
allows defining which analyzer to use for indexing the text data. doc_values
parameter makes indexing data available for aggregations functionality by storing it in a column-oriented fashion. Another one, null_value
parameter allows replacing a null
value field to be replaced with a specified value. You can read more about it here.
Starting v2.0, mappings are immutable. Once applied, they cannot be modified. In the event a mapping needs modification, the suggested alternative is to reindex data in a new index.
Sub fields allow indexing the same field in two different ways, the idea is slightly counter intuitive if you come from a structured database background. Since Elasticsearch is a search engine primarily, data is indexed primarily in a search oriented data structure. However, it's necessary to index it in an exact format for exact search queries and aggregations. Not surprisingly, sub fields only apply to a string
field.
An analyzer is a pre-processor that is applied to data before indexing it. It does three things:
- Sanitizing the string input,
- Tokenizing the input into words,
- and Filtering the tokens.
Because of the focus on searching, Elasticsearch comes with a good number of standard analyzers that can be applied at mapping time to a data field. However, since there is so much room for customization, it supports an interface to add custom analyzers.
GEM also provides a GUI interface to import a user defined analyzer and lists available analyzers to pick from at mapping time.
The specs for creating a custom analyzer can be found here.
A GEM view can be shared externally (both embeddable and as a hyperlink) via the share icon at the top left screen .
Let's say your JSON data looks like this:
{
"name": "geolocation data",
"place": {
"city": "New York",
"country": "United States"
},
"location": [40.3,-74]
}
Use this magic link to view this data directly in GEM. You will need to set the app name
and cluster URL
fields before being able to apply these.
GEM also supports defining a type mapping directly for times when you have the exact definition.
The definition for the above data would like this:
{
"properties": {
"name": {
"type": "string"
},
"place": {
"properties": {
"city": {
"type": "string"
},
"country": {
"type": "string"
}
}
},
"location": {
"type": "geo_point"
}
}
}
Use this magic link to view this in the GEM editor. Obviously, you will need to set the app name
and cluster URL
fields before being able to apply the mappings.
For importing analyzer settings, select the Import Analyzer button from the button group in the bottom left screen.
You can now add one ore more analyzers in the editor view to make them available at mapping creation time. The following JSON can be used for some good defaults.
{
"filter": {
"nGram_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"body_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"stop",
"snowball",
"word_delimiter"
]
},
"standard_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
},
"whitespace_analyzer": {
"type": "whitespace",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
dev
branch is the bleeding edge version of gem, all new changes go here.
gh-pages
branch is for the Github Pages hosted version of the app, it is just the stable version of the dev
branch.
master
branch is more suitable for installing gem locally. The Elasticsearch site plugin for gem uses master
branch.
chrome-extension
branch is for publishing the chrome extension.
- git clone https://github.com/appbaseio/gem
- git checkout master
- npm install
- bower install
- npm start (runs gem on http://localhost:8000)
And build with
$ npm run build
The source code is in the app
directory. Pull requests should be created against the dev
branch.
GEM is available as a hosted app and as a chrome extension.
or
or
bin/plugin install appbaseio/gem
Note
: To make sure you enable CORS settings for your ElasticSearch instance, add the following lines in the ES configuration file.
http.port: 9200
http.cors.allow-origin: "/.*/"
http.cors.enabled: true
http.cors.allow-headers: Authorization, X-Requested-With, Content-Type, Content-Length
http.cors.allow-credentials: true
After installing the plugin, start elasticsearch service
elasticsearch
and visit the following URL to access it.
http://127.0.0.1:9200/_plugin/gem
Note:
If you use Elasticsearch from a different port, the URL to access and the http.cors.allow-origin
value in the configuration file would change accordingly.
GEM is purpose built for the mapping needs of an Elasticsearch index.
dejavu is similarly purpose built for viewing your Elasticsearch index's data and perform CRUD operations, and
mirage is a GUI for composing Elasticsearch queries.
Together, these three apps form the building blocks for powering a great search experience.