Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rest api: add simple query support #973

Merged
merged 1 commit into from
May 15, 2020
Merged

rest api: add simple query support #973

merged 1 commit into from
May 15, 2020

Conversation

jma
Copy link
Contributor

@jma jma commented May 4, 2020

The current elasticsearch query string used by the REST API is powerful
but should not be used for search boxes. It raises exceptions when
the query syntax contains errors. As described in the elaticsearch
documentation, a simple query string should be used. A new http query
optional parameters has been added and can be specified as follows:
&simple=1.
For example: https://ils.rero.ch/global/search/?q=potter&simple=1.

When the simple query syntax is chosen, the default boolean operator is
AND. Except this parameter, nothing has been modified.

  • Adds new type of aggregation filter to perform a AND boolean operator
    between the terms in the same aggregation. All aggregations filters
    use now AND boolean operator.
  • Updates elasticsearch mappings to enhance the search engine quality.
  • Adds a new REST API list records operator to use a simple_query_string
    instead of a query_string. The simple query string preform an AND
    boolean operator by default.
  • Adds search query tests.
  • Adds missing utils test.
  • Renames global_lowercase_asciifolding elasticsearch analyzer to
    default. This makes the rero-ils custom analyzer to be the default for
    all elasticsearch text fields. All elasticsearch mappings has been
    simplified.
  • Creates a new custom_keyword analyzer.
  • Creates a custom elasticsearch image with the icu analysis plugin
    (https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html).
  • Closes search: problem with brackets [ ] in the query #755.
  • Closes Language facet behaviour (number of results) #91.

Required by

Why are you opening this PR?

How to test?

  • Execute a REST API search query with multiple words by adding a &simple=1 option parameter. The number of results should be less than without the param. Try also query that causes syntax errors.

Note: see (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#simple-query-string-syntax) for the supported syntax documentation.

Code review check list

  • Commit message template compliance.
  • Commit message without typos.
  • File names.
  • Functions names.
  • Functions docstrings.
  • Unnecessary commited files?
  • Extracted translations?

@jma jma force-pushed the maj-simple-query branch 2 times, most recently from 90482b5 to d1ce1bf Compare May 4, 2020 13:44
@benerken
Copy link
Contributor

benerken commented May 5, 2020

Tests of simple search on ilsdev:

1. Search on a series (ex: Que sais-je ?)

Que sais-je => returns 174 records
but
Que sais-je ? => returns 0

There is an encoding of the question mark (?) to ?. You can see it
in the URL after the search. See below:

image

2. search on title with brackets works well. Great !
It solves issue #755

Ex: 500 [cinq cents] modèles au point de croix

3. Facets seems to (still) use OR

Do a search on "economie"
https://ilsdev.test.rero.ch/global/search/documents?q=economie&page=1&size=10 (435 results)
Do a filter on language English (10)
Do another filter on Spanish (1)
I expect to get only books published in english and spanish (2 language codes in the same record) but I get all english books and all spanish books. (something like a OR in place of a AND)

4. Search of a term with diacritics should be consistent with the same term without diacritics

économie => 431 records
economie => 435 records

I guess it should return the same number of records.

@jma
Copy link
Contributor Author

jma commented May 5, 2020

Other problems to solve or tests to write:

  • maison blanche == Maison Blanche == Maison + Blanche == MAISON BLANCHE
  • éléphant == elephant 21 == éléphant_ (with space) == elephant_ (with space) == ELEPHANT
  • ABBE == abbe == abbé 17 == ABBé == abbé_
  • müller == muller == mueller (if possible)
  • saute == sautent == saute* == sautez
  • saute* -(sautent | sautez)
  • pelage + (blanc | tigre)
  • "écrivain espagnol"
  • auteur enfant == auteur enfants
  • Notes sur l"affaire Dominici / Essai sur le caractère des personnages
  • Theorien der Sozialpsychologie. Bd. 3, Motivations- und Informationsverarbeitungstheorien / mit Beitr. von Mark W. Baldwin ... [et al.]
  • Que sais-je ?
  • 500 [cinq cents] modèles au point de croix
  • économie == economie

@jma jma force-pushed the maj-simple-query branch 11 times, most recently from 1c06803 to beb3625 Compare May 6, 2020 14:53
@jma
Copy link
Contributor Author

jma commented May 6, 2020

All problems has been fixed and tests has been written. Some of them cannot be addressed such as économie == economie due to the multi lingual support.

@jma jma marked this pull request as ready for review May 6, 2020 14:55
@jma jma requested review from BadrAly, benerken and iGormilhit May 6, 2020 14:56
Copy link

@iGormilhit iGormilhit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

  • line 3, change "as describe" to "as described*
  • line 5, "as follows" to "as follows:"
  • line 10, "Without this parameter, nothing changed" to "Except this parameter, nothing has been modified"
  • line 16, "RES" to "REST"
  • line 17, "simple query string" to "simple query"?

This commit may close this very old issue: #91

@jma jma force-pushed the maj-simple-query branch 3 times, most recently from e7cc19e to f7f88bd Compare May 7, 2020 08:13
@jma jma requested a review from iGormilhit May 7, 2020 09:02
@jma jma force-pushed the maj-simple-query branch from f7f88bd to 8b1c6f7 Compare May 7, 2020 12:43
@jma jma force-pushed the maj-simple-query branch from 8b1c6f7 to 8acb258 Compare May 7, 2020 14:36
@jma jma requested a review from sebdeleze May 11, 2020 07:11
@jma jma force-pushed the maj-simple-query branch 6 times, most recently from 24a13c3 to c855df1 Compare May 12, 2020 16:28
Comment on lines +33 to +37
"fields": {
"analyzed": {
"type": "text",
"analyzer": "custom_keyword"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool feature!

@@ -135,14 +153,25 @@ def search_factory(self, search, query_parser=None):
"""
def _default_parser(qstr=None, query_boosting=[]):
"""Default parser that uses the Q() from elasticsearch_dsl."""
query_type = 'query_string'
default_operator = 'OR'
if request.args.get('simple'):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why operator is set from simple argument. I think it"s not fully related and should be passed as another parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to have two parameters even if you are again right, it is cleaner.

Copy link
Contributor

@pronguen pronguen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Autosuggest display is not optimal when using boolean operators:
    image
  • Œ,œ
    • Œil or œil, 114 documents
    • oeil, 71 documents
    • same for ae
    • ss, ü, ö are working fine
  • The minus to negate a token from the search does not work: example
  • Detail: it could be convenient to be able to search by patron birth year (tokenizer for birth date). Currently, full input "1961-11-24" is necessary. Ideally, input "1961" is working as well

@jma jma force-pushed the maj-simple-query branch 3 times, most recently from 6f36fa7 to d6b77ba Compare May 14, 2020 12:22
@jma
Copy link
Contributor Author

jma commented May 14, 2020

  • Autosuggest display is not optimal when using boolean operators:
    image

  • Œ,œ

    • Œil or œil, 114 documents
    • oeil, 71 documents
    • same for ae
    • ss, ü, ö are working fine
  • The minus to negate a token from the search does not work: example

  • Detail: it could be convenient to be able to search by patron birth year (tokenizer for birth date). Currently, full input "1961-11-24" is necessary. Ideally, input "1961" is working as well

All has been addressed.

The current elasticsearch query string used by the REST API is powerful
but should not be used for search boxes. It raises exceptions when
the query syntax contains errors. As described in the elaticsearch
documentation, a simple query string should be used. A new http query
optional parameters has been added and can be specified as follows:
`&simple=1`.
For example: `https://ils.rero.ch/global/search/?q=potter&simple=1`.

When the simple query syntax is chosen, the default boolean operator is
`AND`. Except this parameter, nothing has been modified.

* Adds new type of aggregation filter to perform a AND boolean operator
between the terms in the same aggregation. All aggregations filters
use now AND boolean operator.
* Updates elasticsearch mappings to enhance the search engine quality.
* Adds a new REST API list records operator to use a `simple_query_string`
instead of a `query_string`. The simple query string preform an AND
boolean operator by default.
* Adds search query tests.
* Adds missing utils test.
* Renames `global_lowercase_asciifolding` elasticsearch analyzer to
`default`. This makes the rero-ils custom analyzer to be the default for
all elasticsearch `text` fields. All elasticsearch mappings has been
simplified.
* Creates a new `custom_keyword` analyzer.
* Creates a custom elasticsearch image with the icu analysis plugin
(https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html).
* Closes rero#755.
* Closes rero#91.
* Closes rero#934.
* Revisits elasticsearch mappings.
* Fixes several JSONSchemas.

Co-Authored-by: Johnny Mariéthoz <[email protected]>
@jma jma force-pushed the maj-simple-query branch from d6b77ba to 9712b60 Compare May 14, 2020 13:21
@pronguen pronguen self-requested a review May 15, 2020 06:55
@jma jma merged commit 875b8cc into rero:dev May 15, 2020
@jma jma deleted the maj-simple-query branch January 13, 2022 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

search: problem with brackets [ ] in the query Language facet behaviour (number of results)
5 participants