elasticsearch-dsl in python: How can I return the "inner hits" for my query?

Question:

I am currently exploring elasticsearch in python using the elasticsearch_dsl library. I am aware that my Elasticsearch knowledge is currently limited.

I have created a model like so:

class Post(InnerDoc):
    text = Text()
    id = Integer()


class User(Document):
    name = Text()
    posts = Object(doc_class=Posts)
    signed_up_at = Date()

The data for posts is an array like this:

[
 { 
    "text": "Test",
    "id": 2
 },
]

Storing my posts works. However, to me this seems wrong. I specify the "posts" attribute to be a Post – not a List of Posts.

Querying works, I can:

  s = Search(using=client).query("match", posts__text="test")

and will retrieve the User that has a post containing the words as a result.
What I want is that I get the user + all Posts that qualified the user to appear in the result (meaning all posts containing the search phrase). I called that the inner hits, but I am not sure if this is correct.

Help would be highly appreciated!

I tried using "nested" instead of "match" for the query, but that does not work:

[nested] query does not support [posts]

I suspect that this has to do with the fact that my index is specified incorrectly.

Asked By: user20418895

||

Answers:

I updated my model to this:

class Post(InnerDoc):
    text = Text(analyzer="snowball")
    id = Integer()


class User(Document):
    name = Text()
    posts = Nested(doc_class=Posts)
    signed_up_at = Date()

This allows me to do the following query:

GET users/_search
{
  "query": {
    "nested": {
      "path": "posts",
      "query": {
        "match": {
          "posts.text": "idea"
        }
      },
      "inner_hits": {} 
    }
  }
}

This translates to the following elasticsearch-dsl query in python:

s = (
    Search(using=client).query(
         "nested", 
         path="posts", 
         query=Q("term", **{"post.text": "Idea"}),
         inner_hits={},
        )

Access inner hits like this:


Using Nested might be required, because of how elasticsearch represents objects internally (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html). As lists of objects might be flattened, it might not allow to retrieve complete inner hits that contain the correct association of text and id for a post.

Answered By: user20418895