How can I extract a single value from a nested data structure (such as from parsing JSON)?

Question:

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:

>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}

Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.

How can I write code that directly gets this value?

I don’t need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can’t figure out exactly how it works for a complex case.

More generally, how can I figure out what the "path" is to the data, and write the code for it?

Asked By: knu2xs

||

Answers:

For reference, let’s see what the original JSON would look like, with pretty formatting:

>>> print(json.dumps(my_json, indent=4))
{
    "name": "ns1:timeSeriesResponseType",
    "declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
    "scope": "javax.xml.bind.JAXBElement$GlobalScope",
    "value": {
        "queryInfo": {
            "creationTime": 1349724919000,
            "queryURL": "http://waterservices.usgs.gov/nwis/iv/",
            "criteria": {
                "locationParam": "[ALL:103232434]",
                "variableParam": "[00060, 00065]"
            },
            "note": [
                {
                    "value": "[ALL:103232434]",
                    "title": "filter:sites"
                },
                {
                    "value": "[mode=LATEST, modifiedSince=null]",
                    "title": "filter:timeRange"
                },
                {
                    "value": "sdas01",
                    "title": "server"
                }
            ]
        }
    },
    "nil": false,
    "globalScope": true,
    "typeSubstituted": false
}

That lets us see the structure of the data more clearly.

In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.

To get the desired value, we simply put those accesses one after another:

my_json['value']['queryInfo']['creationTime'] # 1349724919000
Answered By: dm03514

I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.

If you access the API again, the new data might not match the code’s expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:

name = my_json.get('name') # will return None if 'name' doesn't exist

Another way is to test for a key explicitly:

if 'name' in resp_dict:
    name = resp_dict['name']
else:
    pass

However, these approaches may fail if further accesses are required. A placeholder result of None isn’t a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it’s easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:

try:
    creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
    print("could not read the creation time!")
    # or substitute a placeholder, or raise a new exception, etc.
Answered By: ButtersB

Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:

import json

# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}

# dumps the json object into an element
json_str = json.dumps(data)

# load the json to a string
resp = json.loads(json_str)

# print the resp
print(resp)

# extract an element in the response
print(resp['test1'])
Answered By: Sireesh Yarlagadda

Try this.

Here, I fetch only statecode from the COVID API (a JSON array).

import requests

r = requests.get('https://api.covid19india.org/data.json')

x = r.json()['statewise']

for i in x:
  print(i['statecode'])
Answered By: Sanket Chauhan

Try this:

from functools import reduce
import re


def deep_get_imps(data, key: str):
    split_keys = re.split("[\[\]]", key)
    out_data = data
    for split_key in split_keys:
        if split_key == "":
            return out_data
        elif isinstance(out_data, dict):
            out_data = out_data.get(split_key)
        elif isinstance(out_data, list):
            try:
                sub = int(split_key)
            except ValueError:
                return None
            else:
                length = len(out_data)
                out_data = out_data[sub] if -length <= sub < length else None
        else:
            return None
    return out_data


def deep_get(dictionary, keys):
    return reduce(deep_get_imps, keys.split("."), dictionary)

Then you can use it like below:

res = {
    "status": 200,
    "info": {
        "name": "Test",
        "date": "2021-06-12"
    },
    "result": [{
        "name": "test1",
        "value": 2.5
    }, {
        "name": "test2",
        "value": 1.9
    },{
        "name": "test1",
        "value": 3.1
    }]
}

>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'
Answered By: Heedo Lee

I would like to achieve very similar goal. From the json file I would like to get a new json file, that will include only subset of initial json file. Let me use above example.
With the provided code:

import requests

r = requests.get(‘https://api.covid19india.org/data.json’)

x = r.json()[‘statewise’]

for i in x:
print(i[‘statecode’])

I get exactly what I would like. But the output is not valid json file. It doesn’t have valid json syntax. Can you help me change the code in a way it will output valid json file.
Generated json file is attached here: Json file

Answered By: Rok Rogelj
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.