Filter Json with Ids contained in csv sheet using python

Question:

I have a csv file with some "id". I imported a json file and I needed to filter from this Json only the ids that are in the worksheet
Does anyone knows how to do that? I have no idea, I am very new in python. I am usin Jupyter notebook

How to filter data fetching from variable var_filter

import json
import pandas as pd
from IPython.display import display

# read csv with ids
var_filter = pd.read_csv('file.csv')
display(act_filter)


# Load json
with open('file.json') as f:
  data = json.load(f)
print(data)

The json structure is:

[
    {
        "id": "179328741654819",
        "t_values": [
            {
                "t_id": "963852456741",
                "value": "499.66",
                "date_timestamp": "2020-09-22T15:18:17",
                "type": "in"
            },
            {
                "t_id": "852951753456",
                "value": "1386.78",
                "date_timestamp": "2020-10-31T14:46:44",
                "type": "in"
            }
        ]
    },
    {
        "id": "823971648264792",
        "t_values": [
            {
                "t_id": "753958561456",
                "value": "672.06",
                "date_timestamp": "2020-03-16T22:41:16",
                "type": "in"
            },
            {
                "t_id": "321147951753",
                "value": "773.88",
                "date_timestamp": "2020-05-08T18:29:31",
                "type": "out"
            },
            {
                "t_id": "258951753852",
                "value": "733.13",
                "date_timestamp": null,
                "type": "in"
            }
        ]
    }
]   
Asked By: Wong Chloe

||

Answers:

You can iterate over the elements in the data variable and check if its id value is in the dataframe’s id column. Simple method below, see this article for other methods

Note that I convert the value of the JSONs id to an int as that is what pandas is using as value type for the column

code

import json
from pprint import pprint
import pandas as pd


var_filter = pd.read_csv("id.csv")

# Load json
with open("data.json") as f:
    data = json.load(f)


result = []
for elem in data:
    if int(elem["id"]) in var_filter["id"].values:
        result.append(elem)
pprint(result)

id.csv

id
823971648264792

output

[{'id': '823971648264792',
  't_values': [{'date_timestamp': '2020-03-16T22:41:16',
                't_id': '753958561456',
                'type': 'in',
                'value': '672.06'},
               {'date_timestamp': '2020-05-08T18:29:31',
                't_id': '321147951753',
                'type': 'out',
                'value': '773.88'},
               {'date_timestamp': None,
                't_id': '258951753852',
                'type': 'in',
                'value': '733.13'}]}]
Answered By: Edo Akse
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.