# Elasticsearch – How to create buckets by using information from two fields at the same time?

## Question:

My documents are like this:

```
{'start': 0, 'stop': 3, 'val': 3}
{'start': 2, 'stop': 4, 'val': 1}
{'start': 5, 'stop': 6, 'val': 4}
```

We can imagine that each document occupies the x-coordinates from `'start'`

to `'stop'`

,

and has a certain value `'val'`

(`'start' < 'stop'`

is guaranteed).

**The goal** is to plot a line showing the sum of these values `'val'`

from all the

documents which occupy an x-coordinate:

^{this graph online}

In reality there are many documents with many different `'start'`

and `'stop'`

coordinates. Speed is important, so:

### Is this possible to do with at most a couple of elastic search requests? how?

**What I’ve tried:**

With one elastic search request we can get the `min_start`

, and `max_stop`

coordinates. These will be the boundaries of x.

Then we divide the x-coordinates into `N`

intervals, and in a loop for each interval we make an elastic search request: to filter out all the documents which lie completely outside of this interval, and do a sum aggregation of `'val'`

.

This approach takes too much time because there are `N+1`

requests, and if we want to have a line with higher precision, the time will increase linearly.

Code:

```
N = 300 # number of intervals along x
x = []
y = []
data = es.search(index='index_name',
body={
'aggs': {
'min_start': {'min': {'field': 'start'}},
'max_stop': {'max': {'field': 'stop'}}
}
})
min_x = data['aggregations']['min_start']['value']
max_x = data['aggregations']['max_stop']['value']
x_from = min_x
x_step = (max_x - min_x) / N
for _ in range(N):
x_to = x_from + x_step
data = es.search(
index='index_name',
body= {
'size': 0, # to not return any actual documents
'query': {
'bool': {
'should': [
# start is in the current x-interval:
{'bool': {'must': [
{'range': {'start': {'gte': x_from}}},
{'range': {'start': {'lte': x_to}}}
]}},
# stop is in the current x-interval:
{'bool': {'must': [
{'range': {'stop': {'gte': x_from}}},
{'range': {'stop': {'lte': x_to}}}
]}},
# current x-interval is inside start--stop
{'bool': {'must': [
{'range': {'start': {'lte': x_from}}},
{'range': {'stop': {'gte': x_to}}}
]}}
],
'minimum_should_match': 1 # at least 1 of these 3 conditions should match
}
},
'aggs': {
'vals_sum': {'sum': {'field': 'val'}}
}
}
)
# Append info to the lists:
x.append(x_from)
y.append(data['aggregations']['vals_sum']['value'])
# Next x-interval:
x_from = x_to
from matplotlib import pyplot as plt
plt.plot(x, y)
```

## Answers:

The right way to do this in one single query is to use the `range`

field type (available since 5.2) instead of using two fields `start`

and `stop`

and reimplementing the same logic. Like this:

```
PUT test
{
"mappings": {
"properties": {
"range": {
"type": "integer_range"
},
"val": {
"type":"integer"
}
}
}
}
```

Your documents would look like this:

```
{
"range" : {
"gte" : 0,
"lt" : 3
},
"val" : 3
}
```

And then the query would simply leverage an `histogram`

aggregation like this:

```
POST test/_search
{
"size": 0,
"aggs": {
"histo": {
"histogram": {
"field": "range",
"interval": 1
},
"aggs": {
"total": {
"sum": {
"field": "val"
}
}
}
}
}
}
```

And the results are as expected: 3, 3, 4, 1, 0, 4