iterate over large JSON file using python and create a reduced JSON output only with specific values

Question:

How to iterate over JSON using python with children nodes?

Asked By: paliknight

||

Answers:

This solution should solve your initial problem (remove configurations with ‘vulnerable:false’). I used the sample json data you provided in the question.

import json

with open('data.json','r') as f:
    data = json.load(f)

nodes = data.get('CVE_Items')[0].get('configurations').get('nodes')[0].get('cpe_match')

for index,node in enumerate(nodes):
    if not node.get('vulnerable'):
        nodes.pop(index)

with open('new_data.json','w') as f:
    f.write(json.dumps(data))
Answered By: E Joseph

Here is a way that you could read in the data, clean it (remove the vulnerabilities with false), sort it, and then download it as a new file.

import json

def base_score(item): # sorting function used in .sort()
    # https://stackoverflow.com/questions/3121979/how-to-sort-a-list-tuple-of-lists-tuples-by-the-element-at-a-given-index        
    if 'baseMetricV3' not in item['impact']:
        return (0, item['cve']['CVE_data_meta']['ID']) # no values are at a 0, therefore will sort by ID
    return (item['impact']['baseMetricV3']['cvssV3']['baseScore'], item['cve']['CVE_data_meta']['ID']) # will also sort by ID if there are scores that are the same

with open('nvdcve-1.1-2022.json', 'r') as file: # read in the file and load it as a json format (similar to python dictionaries)
    dict_data = json.load(file)

for CVE_Item in dict_data['CVE_Items']:
    for node in CVE_Item['configurations']['nodes']:
        # https://stackoverflow.com/questions/1207406/how-to-remove-items-from-a-list-while-iterating
        node['cpe_match'][:] = [item for item in node['cpe_match'] if item['vulnerable']] # removing items while iterating through
        if node['children']: # look at the children to see if they have any false vulnerable items and remove
            for child_node in node['children']:
                child_node['cpe_match'][:] = [item for item in child_node['cpe_match'] if item['vulnerable']] # removing items while iterating through

dict_data['CVE_Items'].sort(reverse=True, key=base_score) # sort the data and have it in descending order.

with open('cleaned_nvdcve-1.1-2022.json','w') as f: # write the file to the current working directory.
    f.write(json.dumps(dict_data))
Answered By: Andrew Ryan
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.