Selectively convert dict of lists with dicts into a dataframe

Question:

I would like to convert a dict of lists of a dict into a dataframe selectively. I would only like to take the publisher and the title from the results if the publisher name is Benzinga:


{'results': [{'id': 'knNyIzsECbl3YYPAKIQsEoaO4_roXDftV-auy9lSB-w',
   'publisher': {'name': 'Benzinga',
    'homepage_url': 'https://www.benzinga.com/'},
   'title': 'Earnings Scheduled For May 11, 2021'},
{'id': 'KNDx8p0PytFULh33UWse-BkT7XxpxLZtGLij22tiZMM',
   'publisher': {'name': 'The Motley Fool',
    'homepage_url': 'https://www.fool.com/',
   'title': 'Taysha Gene Therapies, Inc. (TSHA) Q1 2021 Earnings Call Transcript'}]}

expected output:

publisher   title
Benzinga    Earnings Scheduled For May 11, 2021

If I convert to pandas dataframe first then it keeps lists and dicts in the elements of the dataframe…

Asked By: helloimgeorgia

||

Answers:

Is the initial dict supposed to be the following?

data = {'results': 
  [{'id': 'knNyIzsECbl3YYPAKIQsEoaO4_roXDftV-auy9lSB-w',
    'publisher': {'name': 'Benzinga', 
                  'homepage_url': 'https://www.benzinga.com/'},
    'title': 'Earnings Scheduled For May 11, 2021'},
   {'id': 'KNDx8p0PytFULh33UWse-BkT7XxpxLZtGLij22tiZMM',
    'publisher': {'name': 'The Motley Fool',
                  'homepage_url': 'https://www.fool.com/'},
    'title': 'Taysha Gene Therapies, Inc. (TSHA) Q1 2021 Earnings Call Transcript'}]}

If so, then you could create an empty dict of lists and append the selected results (which you could then convert to a DataFrame):

a_dict = {'publisher': [], 'title': []}
for i in data['results']:
    if i['publisher']['name'] == 'Benzinga':
        a_dict['publisher'].append(i['publisher']['name'])
        a_dict['title'].append(i['title'])
a_dict
{'publisher': ['Benzinga'], 'title': ['Earnings Scheduled For May 11, 2021']}
Answered By: wutangforever

Normalize the dict using comprehension then create a new dataframe

pd.DataFrame({'publisher': d['publisher']['name'], 'title': d['title']} for d in dct['results'])

Or you can also try json_normalize:

pd.json_normalize(dct['results'])[['title', 'publisher.name']]

Result

         publisher                                                                title
0         Benzinga                                  Earnings Scheduled For May 11, 2021
1  The Motley Fool  Taysha Gene Therapies, Inc. (TSHA) Q1 2021 Earnings Call Transcript
Answered By: Shubham Sharma