post request with n-delimited JSON in python

Question:

I’m trying to use the bulk API from Elasticsearch and I see that this can be done using the following request which is special because what is given as a “data” is not a proper JSON, but a JSON that uses n as delimiters.

curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d '
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'

My question is how can I perform such request within python? The authors of ElasticSearch suggest to not pretty print the JSON but I’m not sure what it means (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)

I know that this is a valid python request

import requests
import json

data = json.dumps({"field":"value"})

r = requests.post("localhost:9200/_bulk?pretty", data=data)

But what do I do if the JSON is n-delimited?

Asked By: Brian

||

Answers:

What this really is is a set of individual JSON documents, joined together with newlines. So you could do something like this:

data = [
    { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } },
    { "field1" : "value1" },
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" }, },
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" }, },
    { "field1" : "value3" },
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} },
    { "doc" : {"field2" : "value2"} }
]

data_to_post = 'n'.join(json.dumps(d) for d in data)
r = requests.post("localhost:9200/_bulk?pretty", data=data_to_post)

However, as pointed out in the comments, the Elasticsearch Python client is likely to be more useful.

Answered By: Daniel Roseman

As a follow-up to Daniel’s answer above, I had to add an additional 'n' to the end of the data_to_post, and add a {Content-Type: application/x-ndjson} header to get it work in Elasticsearch 6.3.

data_to_post = 'n'.join(json.dumps(d) for d in data) + "n"
headers = {"Content-Type": "application/x-ndjson"}
r = requests.post("http://localhost:9200/_bulk?pretty", data=data_to_post, headers=headers)

Otherwise, I will receive the error:
"The bulk request must be terminated by a newline [\n]"

Answered By: Kai Peng

You can use python ndjson library to do it.
https://pypi.org/project/ndjson/

It contains JSONEncoder and JSONDecoder classes for easy use with other libraries, such as requests:

import ndjson
import requests

response = requests.get('https://example.com/api/data')
items = response.json(cls=ndjson.Decoder)
Answered By: Abhishek T

I’m facing a similar issue with newlines inside string in body, and I can’t seem to make them work.
What am I doing wrong?

email_endpoint = cfg['email_endpoint']
string_to_post = 'Successful Datasets Load on ' + date + ': n'+'n'.join(info.datasets)+'n'
myobj = {'subject': '[SUCCESS] '+cfg['mail_header'] + " " + date, 'TO': info.mail_dl,
         'body': string_to_post}
#print(f"Obj to send email: {myobj}")
#Implement support for Content-Type application/x-ndjson for newline-delimited JSON objects to be summited
headers = {'content-type': 'application/x-ndjson'}
#print(cfg['success_mail_header'])
resp = requests.post(url=email_endpoint, headers=headers, data=json.dumps(myobj))

And this outputs like this in my email:

Successful Datasets Load on 20230302_165436: ds1 ds2

And I needed something "pretty" like:

Successful Datasets Load on 20230302_165436: 
ds1 
ds2
Answered By: neverMind