Reading from nested json and getting None type Error -> try/except

Question:

I am reading data from nested json with this code:

data = json.loads(json_file.json)
for nodesUni in data["data"]["queryUnits"]['nodes']:
        try:
            tm = (nodesUni['sql']['busData'][0]['engine']['engType'])
        except:
            tm = ''
        try:
            to = (nodesUni['sql']['carData'][0]['engineData']['producer']['engName'])
        except:
            to = ''
        json_output_for_one_GU_owner = {
            "EngineType": tm,
            "EngineName": to,
        } 

I am having an issue with None type error (eg. this one doesn’t exists at all nodesUni['sql']['busData'][0]['engine']['engType'] cause there are no data, so I am using try/except. But my code is more complex and having a try/except for every value is crazy. Is there any other option how to deal with this?

Error: "TypeError: ‘NoneType’ object is not subscriptable"

Asked By: Piranha

||

Answers:

This is non-trivial as your requirement is to traverse the dictionaries without errors, and get an empty string value in the end, all that in a very simple expression like cascading the [] operators.

First method

My approach is to add a hook when loading the json file, so it creates default dictionaries in an infinite way

import collections,json

def superdefaultdict():
    return collections.defaultdict(superdefaultdict)

def hook(s):
    c = superdefaultdict()
    c.update(s)
    return(c)

data = json.loads('{"foo":"bar"}',object_hook=hook)

print(data["x"][0]["zzz"])   # doesn't exist
print(data["foo"])  # exists

prints:

defaultdict(<function superdefaultdict at 0x000001ECEFA47160>, {})
bar

when accessing some combination of keys that don’t exist (at any level), superdefaultdict recursively creates a defaultdict of itself (this is a nice pattern, you can read more about it in Is there a standard class for an infinitely nested defaultdict?), allowing any number of non-existing key levels.

Now the only drawback is that it returns a defaultdict(<function superdefaultdict at 0x000001ECEFA47160>, {}) which is ugly. So

print(data["x"][0]["zzz"] or "")

prints empty string if the dictionary is empty. That should suffice for your purpose.

Use like that in your context:

def superdefaultdict():
    return collections.defaultdict(superdefaultdict)

def hook(s):
    c = superdefaultdict()
    c.update(s)
    return(c)

data = json.loads(json_file.json,object_hook=hook)
for nodesUni in data["data"]["queryUnits"]['nodes']:
    tm = nodesUni['sql']['busData'][0]['engine']['engType'] or ""
    to = nodesUni['sql']['carData'][0]['engineData']['producer']['engName'] or ""

Drawbacks:

  • It creates a lot of empty dictionaries in your data object. Shouldn’t be a problem (except if you’re very low in memory) as the object isn’t dumped to a file afterwards (where the non-existent values would appear)
  • If a value already exists, trying to access it as a dictionary crashes the program
  • Also if some value is 0 or an empty list, the or operator will pick "". This can be workarounded with another wrapper that tests if the object is an empty superdefaultdict instead. Less elegant but doable.

Second method

Convert the access of your successive dictionaries as a string (for instance just double quote your expression like "['sql']['busData'][0]['engine']['engType']", parse it, and loop on the keys to get the data. If there’s an exception, stop and return an empty string.

import json,re,operator

def get(key,data):
    key_parts = [x.strip("'") if x.startswith("'") else int(x) for x in re.findall(r"[([^]]*)]",key)]
    try:
        for k in key_parts:
            data = data[k]
        return data
    except (KeyError,IndexError,TypeError):
        return ""

testing with some simple data:

data = json.loads('{"foo":"bar","hello":{"a":12}}')

print(get("['sql']['busData'][0]['engine']['engType']",data))
print(get("['hello']['a']",data))
print(get("['hello']['a']['e']",data))

we get, empty string (some keys are missing), 12 (the path is valid), empty string (we tried to traverse a non-dict existing value).

The syntax could be simplified (ex: "sql"."busData".O."engine"."engType") but would still have to retain a way to differentiate keys (strings) from indices (integers)

The second approach is probably the most flexible one.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.