How can I properly extract a JSON object returned in a HTTP response in Python?

Question:

How can I properly extract a JSON object returned in an HTTP response in Python?

Example of HTTP response:

response.headers:

{'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Api-Key,User-Agent,If-Modified-Since,Prefer,location,retry-after', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,OPTIONS,PATCH', 'Access-Control-Allow-Origin': '*', 'Access-Control-Expose-Headers': 'location, retry-after', 'Content-Type': 'multipart/mixed; boundary=Boundary_135_252454503_1565822450438;charset=UTF-8', 'Date': 'Wed, 14 Aug 2019 22:40:50 GMT', 'MIME-Version': '1.0', 'Server': 'openresty', 'x-request-id': 'hTZwyZTptqrV66x9zS9QUzP111dFAKK', 'Content-Length': '1755', 'Connection': 'keep-alive'}

response.text:

--Boundary_1179_100931884_1565537434788    
Content-Type: application/json    
Content-Disposition: form-data; name="contentAnalyzerResponse"   
 
{"statuses":[{"engine":"sentence-compression:Service-eddfab1fa6033f7","invocations":[{"outputs":{"parse_output":{"dc:format":"application/json","sensei:multipart_field_name":"outfile"}},"message":null,"status":"200"}]}],"request_id":"YkFpUviICewtW6smvapHKxj"}

--Boundary_1179_100931884_1565537434788    
Content-Type: application/octet-stream    
Content-Disposition: form-data; name="outfile"     

{"Sent00001":{"NLP_Input":"Remove the blue cell phone next_to the table","NLP_CompGraph":{"Command":{"action1":"Re"},"Intersect":{"intersect-id":"ix1","Locate":{"noun-expr":"cell phone","modifiers":[{"type":"Color","HSV":[0.618181818181818,0.9865470852017931,0.8745098039215681],"value":"blue"}],"noun-hypernyms":[["cell_phone.n.01","radiotelephone.n.02","telephone.n.01","electronic_equipment.n.01","equipment.n.01"]],"noun-nxid":"nx1","noun-synonyms":[["cellular_telephone.n.01.cellular_telephone","cellular_telephone.n.01.cellular_phone","cellular_telephone.n.01.cellphone","cellular_telephone.n.01.cell","cellular_telephone.n.01.mobile_phone"]]},"Relate":{"Locate":{"noun-expr":"table","noun-hypernyms":[["table.n.01","array.n.01","arrangement.n.02","group.n.01","abstraction.n.06"],["table.n.02","furniture.n.01","furnishing.n.02","instrumentality.n.03","artifact.n.01"]],"noun-nxid":"nx2","noun-synonyms":[["table.n.01.table","table.n.01.tabular_array"],["table.n.02.table"]]},"Predicate":{"relationships":[{"type":"spatial","value":"next_to"}]}}}}}}

--Boundary_1179_100931884_1565537434788—

I just want to extract the JSON object:

{"Sent00001":{"NLP_Input":"Remove the blue cell phone next_to the table","NLP_CompGraph":{"Command":{"action1":"Re"},"Intersect":{"intersect-id":"ix1","Locate":{"noun-expr":"cell phone","modifiers":[{"type":"Color","HSV":[0.618181818181818,0.9865470852017931,0.8745098039215681],"value":"blue"}],"noun-hypernyms":[["cell_phone.n.01","radiotelephone.n.02","telephone.n.01","electronic_equipment.n.01","equipment.n.01"]],"noun-nxid":"nx1","noun-synonyms":[["cellular_telephone.n.01.cellular_telephone","cellular_telephone.n.01.cellular_phone","cellular_telephone.n.01.cellphone","cellular_telephone.n.01.cell","cellular_telephone.n.01.mobile_phone"]]},"Relate":{"Locate":{"noun-expr":"table","noun-hypernyms":[["table.n.01","array.n.01","arrangement.n.02","group.n.01","abstraction.n.06"],["table.n.02","furniture.n.01","furnishing.n.02","instrumentality.n.03","artifact.n.01"]],"noun-nxid":"nx2","noun-synonyms":[["table.n.01.table","table.n.01.tabular_array"],["table.n.02.table"]]},"Predicate":{"relationships":[{"type":"spatial","value":"next_to"}]}}}}}}

Code used to obtain this HTTP response:

# Using python 3.6
import requests
[...]
response = requests.post(service_url,
                         headers=headersD,
                         files=filesD)
print("response.headers:")
print(response.headers)
print("response.text:")
print(response.text)

Using print(response.json()) yields the error:

Traceback (most recent call last):
  File "C:UsersFranckyDocumentsGitHubservicescall_selection_parse.py", line 166, in <module>
    print('response.json(): {0}'.format(response.json()))
  File "C:Anacondaenvspy36libsite-packagesrequestsmodels.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:Anacondaenvspy36libjson__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:Anacondaenvspy36libjsondecoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:Anacondaenvspy36libjsondecoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Is the any standard way to do so since this is a standard HTTP response and that the multipart/form-data part is clearly defined in RFC7578, or am I supposed to parse the HTTP response just like any other string?


Example of adhoc parsing below: I would prefer a more principled way to obtain the JSON object.

def parse_response(self,text):
    outfileFlag = False
    jsonD = None
    compGraphJsonString = ""
    for line in text.split("n"):
      # print ("    "+line)
        if "name="outfile"" in line:
            outfileFlag = True
        if outfileFlag:
            if (line.startswith("{") or
                line.startswith("[")):
                  compGraphJsonString = line
                  break
    if compGraphJsonString:
        jsonD = json.loads(compGraphJsonString)
    return jsonD
Asked By: Franck Dernoncourt

||

Answers:

You can uses http.request. Send request and JSON will be in response.body. Or http.client, examle:

con = http.client.HTTPConnection(TI['adr'], TI['port'])
headers = {"Content-Type": "text/xml; charset=utf-8"}
request = shablon.format(**people)
con.request("POST", TI['url'], request.encode('utf-8'), 
       headers=headers)
 result = con.getresponse().read()
 result = result.decode('utf-8')
 print(result)

full code in https://github.com/ProstakovAlexey/1019_mongo/blob/master/for_testing/one_test.py

Answered By: Prostakov Alexey

You can use the Python package requests-toolbelt to parse the multipart/form-data part of an HTTP response:

# Tested with python 3.6 and requests-toolbelt==0.9.1
import requests_toolbelt.multipart # pip install requests-toolbelt
import pprint
import ast
...
multipart_data = requests_toolbelt.multipart.decoder.MultipartDecoder.from_response(response)
for part in multipart_data.parts:
    if part.headers[b'Content-Disposition'] != b'form-data; name="outfile"': continue
    pprint.pprint(ast.literal_eval(part.content.decode('utf-8')), indent=2)
Answered By: Franck Dernoncourt