getting results from a doubled nest JSON into a pandas df if there

Question:

I am trying to get facebook data for a business I run, and put it into a pandas dataframe. Some posts have comments and others do not, and I am trying to get a dataframe from it.

The JSON I have is this:

{'data': [{'id': 'user_id_post_id1'},
  {'id': 'user_id_post_id2'},
  {'id': 'user_id_post_id3'},
  {'comments': {'data': [{'created_time': '2022-11-09T00:15:29+0000',
      'message': 'comment_id',
      'id': 'user_who_commented_the_id_comment_id'}]},
   'id': 'user_id_post_id4'},
  {'id': 'user_id_post_id5'}...]}

I am trying to get a pandas df that looks like this:

df = pd.DataFrame(data = data)
    
print(df)
0    User ID and Post ID   comment                 Commenter_id
1    user_id_post_id       0 or N/A                  0 or N/A        
2    user_id_post_id1      0 or N/A                  0 or N/A
2    user_id_post_id2      0 or N/a                  0 or N/A
3    user_id_post_id3      Comment_id                user_who_commented_the_id_comment_id
4    user_id_post_id3      Comment_id*               user_who_commented_the_id_comment_id
2    user_id_post_id4      0 or N/a                  0 or N/A

* means another comment under the same User ID and Post ID 

And so on

I know how to do it when there is no double nested json, but having trouble trying to append it over. Have tried this command and to no avail.

df = pd.json_normalize(data=JSON_Name["data"]["comments"])
 
and get this as the return value: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_1/FileName.py in <module>
----> 1 df = pd.json_normalize(data=basic_insight["data"]["comments"])

TypeError: list indices must be integers or slices, not str

Any help would be appericated!

Asked By: Ursa Major

||

Answers:

Try:

data = {
    "data": [
        {"id": "user_id_post_id1"},
        {"id": "user_id_post_id2"},
        {"id": "user_id_post_id3"},
        {
            "comments": {
                "data": [
                    {
                        "created_time": "2022-11-09T00:15:29+0000",
                        "message": "comment_id",
                        "id": "user_who_commented_the_id_comment_id",
                    }
                ]
            },
            "id": "user_id_post_id4",
        },
        {"id": "user_id_post_id5"},
    ]
}

tmp = [
    {
        "User ID and Post ID": d["id"],
        "Commenter_id": d.get("comments", {}).get("data"),
    }
    for d in data["data"]
]

df = pd.DataFrame(tmp).explode("Commenter_id")
df["comment"] = df["Commenter_id"].str["message"]
df["Commenter_id"] = df["Commenter_id"].str["id"]
print(df)

Prints:

  User ID and Post ID                          Commenter_id     comment
0    user_id_post_id1                                  None        None
1    user_id_post_id2                                  None        None
2    user_id_post_id3                                  None        None
3    user_id_post_id4  user_who_commented_the_id_comment_id  comment_id
4    user_id_post_id5                                  None        None
Answered By: Andrej Kesely
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.