Using shingles and fuzziness in Elasticsearch Python DSL?
Question:
How do you call shingles in Python DSL?
This is a simple example that searches for a phrase in the “name” field and another one in the “surname” field.
import json
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
def make_dsl_query(fields):
"""
Construct a query
"""
es_client = Elasticsearch()
my_query = Search(using=es_client, index="my_index", doc_type="my_type")
if fields['name'] and fields['surname']:
my_query = my_query.query(Q('bool', should=
[Q("match", name=fields['name']),
Q("match", surname=fields['surname'])]))
return my_query
if __name__ == '__main__':
my_query = make_dsl_query(fields={"name": "Ivan The Terrible", "surname": "Conqueror of the World"})
response = my_query.execute()
# print response
for hit in response:
print(hit.meta.score, hit.name, hit.surname)
1) Is it possible to use shingles? And how? I’ve tried many things and can’t find anything in the documentation on it.
This would work in a normal Elasticsearch query, but apparently called in a different way in the Python DSL…
my_query = my_query.query(Q('bool', should=
[Q("match", name.shingles=fields['name']),
Q("match", surname.shingles=fields['surname'])]))
2) How do I pass fuzziness parameters to my match? Can’t seem to find anything on it either. Ideally I would be able to do something like this:
my_query = my_query.query(Q('bool', should=
[Q("match", name=fields['name'], fuzziness="AUTO", max_expansions=10),
Q("match", surname=fields['surname'])]))
Answers:
To use shingles you need to define them in your mappings, it’s too late to try and use them in query time. At query time what you can do is use a match_phrase
query.
my_query = my_query.query(Q('bool', should=
[Q("match", name.shingles=fields['name']),
Q("match", surname.shingles=fields['surname'])]))
This should work if written as:
my_query = my_query.query(Q('bool', should=
[Q("match", name__shingles=fields['name']),
Q("match", surname__shingles=fields['surname'])]))
Assuming you have the shingles
field defined on both name
and surname
fields.
Note that you can also use the |
operator:
my_query = Q("match", name__shingles=fields['name']) | Q("match", surname.shingles=fields['surname'])
instead of constructing the bool
query yourself.
Hope this helps.
As of January, 2023: elasticsearch-dsl does support fuzzy matches, but it’s just not very well documented.
For simple fuzzy matches:
Q('fuzzy', fieldName=matchString)
When you want to set a custom fuzziness:
Q({"fuzzy": {"yourFieldName": {"value": matchString, "fuzziness": fuzziness}}})
My understanding is that the fuzzy
keyword is just a wrapper for a standard query, see https://github.com/elastic/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/query.py#L362.
Source:
- https://github.com/elastic/elasticsearch-dsl-py/issues/1510 (solution courtesy of @leberknecht on github)
How do you call shingles in Python DSL?
This is a simple example that searches for a phrase in the “name” field and another one in the “surname” field.
import json
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
def make_dsl_query(fields):
"""
Construct a query
"""
es_client = Elasticsearch()
my_query = Search(using=es_client, index="my_index", doc_type="my_type")
if fields['name'] and fields['surname']:
my_query = my_query.query(Q('bool', should=
[Q("match", name=fields['name']),
Q("match", surname=fields['surname'])]))
return my_query
if __name__ == '__main__':
my_query = make_dsl_query(fields={"name": "Ivan The Terrible", "surname": "Conqueror of the World"})
response = my_query.execute()
# print response
for hit in response:
print(hit.meta.score, hit.name, hit.surname)
1) Is it possible to use shingles? And how? I’ve tried many things and can’t find anything in the documentation on it.
This would work in a normal Elasticsearch query, but apparently called in a different way in the Python DSL…
my_query = my_query.query(Q('bool', should=
[Q("match", name.shingles=fields['name']),
Q("match", surname.shingles=fields['surname'])]))
2) How do I pass fuzziness parameters to my match? Can’t seem to find anything on it either. Ideally I would be able to do something like this:
my_query = my_query.query(Q('bool', should=
[Q("match", name=fields['name'], fuzziness="AUTO", max_expansions=10),
Q("match", surname=fields['surname'])]))
To use shingles you need to define them in your mappings, it’s too late to try and use them in query time. At query time what you can do is use a match_phrase
query.
my_query = my_query.query(Q('bool', should= [Q("match", name.shingles=fields['name']), Q("match", surname.shingles=fields['surname'])]))
This should work if written as:
my_query = my_query.query(Q('bool', should=
[Q("match", name__shingles=fields['name']),
Q("match", surname__shingles=fields['surname'])]))
Assuming you have the shingles
field defined on both name
and surname
fields.
Note that you can also use the |
operator:
my_query = Q("match", name__shingles=fields['name']) | Q("match", surname.shingles=fields['surname'])
instead of constructing the bool
query yourself.
Hope this helps.
As of January, 2023: elasticsearch-dsl does support fuzzy matches, but it’s just not very well documented.
For simple fuzzy matches:
Q('fuzzy', fieldName=matchString)
When you want to set a custom fuzziness:
Q({"fuzzy": {"yourFieldName": {"value": matchString, "fuzziness": fuzziness}}})
My understanding is that the fuzzy
keyword is just a wrapper for a standard query, see https://github.com/elastic/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/query.py#L362.
Source:
- https://github.com/elastic/elasticsearch-dsl-py/issues/1510 (solution courtesy of @leberknecht on github)