Remove JSON list string items based on a list of strings

Question:

Following is my sample json file:

    {
        "test": [{
            "Location": "Singapore",
            "Values": [{
                    "Name": "00",
                    "subvalues": [
                        "5782115e1",
                        "688ddd16e",
                        "3e91dc966",
                        "5add96256",
                        "10352cf0f",
                        "17f233d31",
                        "130c09135",
                        "2f49eb2a6",
                        "2ae1ad9e0",
                        "23fd76115"
                    ]
                },
                {
                    "Name": "01",
                    "subvalues": [
                        "b43678dfe",
                        "202c7f508",
                        "73afcaf7c"
                    ]
                }
            ]
        }]
    }

I’m trying to remove from json file using the following list: ["130c09135", "2f49eb2a6", "5782115e1", "b43678dfe"]

end result:

 {
    "test": [{
        "Location": "Singapore",
        "Values": [{
                "Name": "00",
                "subvalues": [
                    "688ddd16e",
                    "3e91dc966",
                    "5add96256",
                    "10352cf0f",
                    "17f233d31",
                    "2ae1ad9e0",
                    "23fd76115"
                ]
            },
            {
                "Name": "01",
                "subvalues": [
                    "202c7f508",
                    "73afcaf7c"
                ]
            }
        ]
    }]
 }

I know that using replace in text it would break the structure, new to json, any help would be appreciated.

Asked By: Knight

||

Answers:

You can use following code snippet:

import json

toRemoveList = ["130c09135", "2f49eb2a6", "5782115e1", "b43678dfe"]

with open('data.json', 'r') as file:
    jsonData = json.loads(file.read())

for valueIndex in range(0, len(jsonData["test"][0]["Values"])):
    value = jsonData["test"][0]["Values"][valueIndex]

    filtered = [x for x in value["Subvalues"] if x not in toRemoveList]

    jsonData["test"][0]["Values"][valueIndex]["Subvalues"] = filtered

with open('newData.json', 'w') as file:
    json.dump(jsonData, file, indent=4)

Note: You must use ‘Subvalues’ with same writing in every instance. You can’t use ‘Subvalues’ and ‘subvalues’ in different instances…

Answered By: Refet

Here is a generalised approach that does not rely on names of keys or depth. The only assumption is that if the dictionary contains any list comprised entirely of strings, it will be reconstructed excluding certain values – i.e., the EXCLUSIONS set

from json import load as LOAD, dumps as DUMPS

FILENAME = '/Volumes/G-Drive/foo.json'
EXCLUSIONS = {"130c09135", "2f49eb2a6", "5782115e1", "b43678dfe"}

def process(d):
    if isinstance(d, dict):
        for v in d.values():
            process(v)
    elif isinstance(d, list):
        if all(isinstance(v, str) for v in d):
            d[:] = [v for v in d if v not in EXCLUSIONS]
        else:
            for v in d:
                process(v)
    return d


with open(FILENAME) as data:
    print(DUMPS(process(LOAD(data)), indent=2))

Output:

{
  "test": [
    {
      "Location": "Singapore",
      "Values": [
        {
          "Name": "00",
          "Subvalues": [
            "688ddd16e",
            "3e91dc966",
            "5add96256",
            "10352cf0f",
            "17f233d31",
            "2ae1ad9e0",
            "23fd76115"
          ]
        },
        {
          "Name": "01",
          "subvalues": [
            "202c7f508",
            "73afcaf7c"
          ]
        }
      ]
    }
  ]
}
Answered By: Pingu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.