post request with n-delimited JSON in python
Question:
I’m trying to use the bulk API from Elasticsearch and I see that this can be done using the following request which is special because what is given as a “data” is not a proper JSON, but a JSON that uses n as delimiters.
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'
My question is how can I perform such request within python? The authors of ElasticSearch suggest to not pretty print the JSON but I’m not sure what it means (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
I know that this is a valid python request
import requests
import json
data = json.dumps({"field":"value"})
r = requests.post("localhost:9200/_bulk?pretty", data=data)
But what do I do if the JSON is n-delimited?
Answers:
What this really is is a set of individual JSON documents, joined together with newlines. So you could do something like this:
data = [
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } },
{ "field1" : "value1" },
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" }, },
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" }, },
{ "field1" : "value3" },
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} },
{ "doc" : {"field2" : "value2"} }
]
data_to_post = 'n'.join(json.dumps(d) for d in data)
r = requests.post("localhost:9200/_bulk?pretty", data=data_to_post)
However, as pointed out in the comments, the Elasticsearch Python client is likely to be more useful.
As a follow-up to Daniel’s answer above, I had to add an additional 'n'
to the end of the data_to_post, and add a {Content-Type: application/x-ndjson}
header to get it work in Elasticsearch 6.3.
data_to_post = 'n'.join(json.dumps(d) for d in data) + "n"
headers = {"Content-Type": "application/x-ndjson"}
r = requests.post("http://localhost:9200/_bulk?pretty", data=data_to_post, headers=headers)
Otherwise, I will receive the error:
"The bulk request must be terminated by a newline [\n]"
You can use python ndjson library to do it.
https://pypi.org/project/ndjson/
It contains JSONEncoder and JSONDecoder classes for easy use with other libraries, such as requests:
import ndjson
import requests
response = requests.get('https://example.com/api/data')
items = response.json(cls=ndjson.Decoder)
I’m facing a similar issue with newlines inside string in body, and I can’t seem to make them work.
What am I doing wrong?
email_endpoint = cfg['email_endpoint']
string_to_post = 'Successful Datasets Load on ' + date + ': n'+'n'.join(info.datasets)+'n'
myobj = {'subject': '[SUCCESS] '+cfg['mail_header'] + " " + date, 'TO': info.mail_dl,
'body': string_to_post}
#print(f"Obj to send email: {myobj}")
#Implement support for Content-Type application/x-ndjson for newline-delimited JSON objects to be summited
headers = {'content-type': 'application/x-ndjson'}
#print(cfg['success_mail_header'])
resp = requests.post(url=email_endpoint, headers=headers, data=json.dumps(myobj))
And this outputs like this in my email:
Successful Datasets Load on 20230302_165436: ds1 ds2
And I needed something "pretty" like:
Successful Datasets Load on 20230302_165436:
ds1
ds2
I’m trying to use the bulk API from Elasticsearch and I see that this can be done using the following request which is special because what is given as a “data” is not a proper JSON, but a JSON that uses n as delimiters.
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'
My question is how can I perform such request within python? The authors of ElasticSearch suggest to not pretty print the JSON but I’m not sure what it means (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
I know that this is a valid python request
import requests
import json
data = json.dumps({"field":"value"})
r = requests.post("localhost:9200/_bulk?pretty", data=data)
But what do I do if the JSON is n-delimited?
What this really is is a set of individual JSON documents, joined together with newlines. So you could do something like this:
data = [
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } },
{ "field1" : "value1" },
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" }, },
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" }, },
{ "field1" : "value3" },
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} },
{ "doc" : {"field2" : "value2"} }
]
data_to_post = 'n'.join(json.dumps(d) for d in data)
r = requests.post("localhost:9200/_bulk?pretty", data=data_to_post)
However, as pointed out in the comments, the Elasticsearch Python client is likely to be more useful.
As a follow-up to Daniel’s answer above, I had to add an additional 'n'
to the end of the data_to_post, and add a {Content-Type: application/x-ndjson}
header to get it work in Elasticsearch 6.3.
data_to_post = 'n'.join(json.dumps(d) for d in data) + "n"
headers = {"Content-Type": "application/x-ndjson"}
r = requests.post("http://localhost:9200/_bulk?pretty", data=data_to_post, headers=headers)
Otherwise, I will receive the error:
"The bulk request must be terminated by a newline [\n]"
You can use python ndjson library to do it.
https://pypi.org/project/ndjson/
It contains JSONEncoder and JSONDecoder classes for easy use with other libraries, such as requests:
import ndjson
import requests
response = requests.get('https://example.com/api/data')
items = response.json(cls=ndjson.Decoder)
I’m facing a similar issue with newlines inside string in body, and I can’t seem to make them work.
What am I doing wrong?
email_endpoint = cfg['email_endpoint']
string_to_post = 'Successful Datasets Load on ' + date + ': n'+'n'.join(info.datasets)+'n'
myobj = {'subject': '[SUCCESS] '+cfg['mail_header'] + " " + date, 'TO': info.mail_dl,
'body': string_to_post}
#print(f"Obj to send email: {myobj}")
#Implement support for Content-Type application/x-ndjson for newline-delimited JSON objects to be summited
headers = {'content-type': 'application/x-ndjson'}
#print(cfg['success_mail_header'])
resp = requests.post(url=email_endpoint, headers=headers, data=json.dumps(myobj))
And this outputs like this in my email:
Successful Datasets Load on 20230302_165436: ds1 ds2
And I needed something "pretty" like:
Successful Datasets Load on 20230302_165436:
ds1
ds2