Delete entire objects in JSON on the condition it doesn't include keys/values

Question:

I’m trying to delete entire objects in a JSON file on the condition that they do not include ALL keys: "transaction_date", "asset_description", "asset_type", "type" and "amount" keys.

Below is my JSON file (it’s been cut for this example):

{
    "first_name": {
        "0": "Thomas",
        "1": "John",
    },
    "transactions": {
       "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            }
          ],
       "1": [
            {
                "scanned_pdf": true,
                "ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C,
                "date_recieved": "01/30/2013"
            }
          ],
          
     }
}

I need to delete the entire "1" data from transactions and first_name. There are more then these two in the original file so the code needs to be universal to any amount rather than using [0], [1] etc. My code below tries to find items in "transactions" that do not include "scanned_pdf", "ptr_link" and "date_recieved" and then saves the JSON just with that updated data (my method is kind of inversed, so instead of deleting objects if it doesn’t include x, it will pick up the objects that don’t include y and update the JSON):

import json

with open("xxxtester.json", "r") as f_in:
    data = json.load(f_in)

to_delete = {"scanned_pdf", "ptr_link", "date_recieved"}

for k in data["transactions"]:
    data["transactions"][k] = [
        {kk: vv for kk, vv in d.items() if kk not in to_delete}
        for d in data["transactions"][k]]


open("xxxtester.json", "w").write(
    json.dumps(data, indent=4))

However, my output still shows the "1" but with empty data "{}" etc. Should I use a different method of logic towards this? Or is it possible to add code to the existing script to make it work.

below is my desired output:

{
    "first_name": {
        "0": "Thomas",
    },
    "transactions": {
       "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            }
          ],
      }
}
Asked By: kie_codes

||

Answers:

With this code you are going to delete the whole thing.

import json

with open("xxxtester.json", "r") as f_in:
    data = json.load(f_in)


with open("xxxtester.json", "w") as f:
    del data["transactions"]["1"]
    json.dump(data, f)
Answered By: taktischer

If we reverse your logic (so we’re selecting items we want to keep, rather than the other way around) and add a second comprehension to filter out empty values, we end up with this:

import json

with open("xxxtester.json", "r") as f_in:
    data = json.load(f_in)

required = set(
    ("transaction_date", "asset_description", "asset_type", "type", "amount")
)

data["transactions"] = {
    k: [transaction for transaction in v if all(k in transaction for k in required)]
    for k, v in data['transactions'].items()
}

data["transactions"] = {
    k: v for k, v in data['transactions'].items() if v
}

# Update data["first_name"] so that it only contains keys that also exists
# in data["transactions"].
data["first_name"] = {k: v for k, v in data["first_name"].items() if k in data["transactions"]}

print(json.dumps(data, indent=4))

Given input like this:

{
    "first_name": {
        "0": "Thomas",
        "1": "John"
    },
    "transactions": {
       "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            },
            {
                "scanned_pdf": true,
                "ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C",
                "date_recieved": "01/30/2013"
            }
          ],
       "1": [
            {
                "scanned_pdf": true,
                "ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C",
                "date_recieved": "01/30/2013"
            }
          ]
     }
}

The above code produces:

{
    "first_name": {
        "0": "Thomas"
    },
    "transactions": {
        "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            }
        ]
    }
}

The first dictionary comprehension…

data["transactions"] = {
    k: [transaction for transaction in v if all(k in transaction for k in required)]
    for k, v in data['transactions'].items()
}

…produces:

...
    "transactions": {
        "0": [
            {
                "transaction_date": "11/29/2022",
                "asset_description": "FireEye, Inc.",
                "asset_type": "Stock",
                "type": "Sale (Partial)",
                "amount": "$1,001 - $15,000"
            }
        ],
        "1": []
    }
...

The second comprehension filters out keys that have empty lists as values.

The third comprehension removes items from data["first_name] that don’t exist in data["transactions"].

Answered By: larsks
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.