Extract values from a dict inside a list in a pandas dataframe
Question:
I have this pandas column containing a dict inside a list:
1 []
2 [{'mal_id': 23, 'type': 'anime', 'name': 'Bandai Visual', 'url': 'https://myanimelist.net/anime/producer/23/Bandai_Visual'}, {'mal_id': 703, 'type': 'anime', 'name': 'Notes', 'url': 'https://myanimelist.net/anime/producer/703/Notes'}]
3 [{'mal_id': 1003, 'type': 'anime', 'name': 'Nippon Television Network', 'url': 'https://myanimelist.net/anime/producer/1003/Nippon_Television_Network'}]
I need to extract the name and store it in a column of list of names like so:
1 []
2 [Bandai Visual,Notes]
3 [Nippon Television Network]
I have tried:
df_trimmed['producer_name'] = df_trimmed['producers'].apply(lambda x: x[0]['name'] if x else None)
TypeError: string indices must be integers
I also have tried:
prods = df_trimmed['producers'].str[0].str['name']
prods
NaN
NaN
NaN
Another attempt:
def extract_names(row):
names = []
for producer in row:
names.append(producer['name'])
return names
df_trimmed['producer_names'] = df_trimmed['producers'].apply(extract_names)
TypeError: string indices must be integers
As you can the first and third approaches return an error while the second approach returns null values. Any help would be appreciated!
Edit:
I mistakenly added " ". It is supposed to be a list and not a string. I have edited to mirror the actual data types
Answers:
Assuming col
the column of interest, you can use:
from ast import literal_eval
(df['col']
#.apply(literal_eval) # uncomment if you have strings
.explode()
.str['name']
.groupby(level=0)
.agg(lambda x: list(x.dropna()))
)
Output:
0 []
1 [Bandai Visual, Notes]
2 [Nippon Television Network]
Name: col, dtype: object
Used input:
df = pd.DataFrame({'col': ['[]', "[{'mal_id': 23, 'type': 'anime', 'name': 'Bandai Visual', 'url': 'https://myanimelist.net/anime/producer/23/Bandai_Visual'}, {'mal_id': 703, 'type': 'anime', 'name': 'Notes', 'url': 'https://myanimelist.net/anime/producer/703/Notes'}]", "[{'mal_id': 1003, 'type': 'anime', 'name': 'Nippon Television Network', 'url': 'https://myanimelist.net/anime/producer/1003/Nippon_Television_Network'}]"]})
Another approach using a list comprehension:
df['col'] = [[d['name'] for d in l
if isinstance(d, dict)
and 'name' in d]
for l in df['col']]
Extract the needed attribute from a list of dicts:
df['col'].apply(lambda x: [d['name'] for d in x])
0 []
1 [Bandai Visual, Notes]
2 [Nippon Television Network]
I have this pandas column containing a dict inside a list:
1 []
2 [{'mal_id': 23, 'type': 'anime', 'name': 'Bandai Visual', 'url': 'https://myanimelist.net/anime/producer/23/Bandai_Visual'}, {'mal_id': 703, 'type': 'anime', 'name': 'Notes', 'url': 'https://myanimelist.net/anime/producer/703/Notes'}]
3 [{'mal_id': 1003, 'type': 'anime', 'name': 'Nippon Television Network', 'url': 'https://myanimelist.net/anime/producer/1003/Nippon_Television_Network'}]
I need to extract the name and store it in a column of list of names like so:
1 []
2 [Bandai Visual,Notes]
3 [Nippon Television Network]
I have tried:
df_trimmed['producer_name'] = df_trimmed['producers'].apply(lambda x: x[0]['name'] if x else None)
TypeError: string indices must be integers
I also have tried:
prods = df_trimmed['producers'].str[0].str['name']
prods
NaN
NaN
NaN
Another attempt:
def extract_names(row):
names = []
for producer in row:
names.append(producer['name'])
return names
df_trimmed['producer_names'] = df_trimmed['producers'].apply(extract_names)
TypeError: string indices must be integers
As you can the first and third approaches return an error while the second approach returns null values. Any help would be appreciated!
Edit:
I mistakenly added " ". It is supposed to be a list and not a string. I have edited to mirror the actual data types
Assuming col
the column of interest, you can use:
from ast import literal_eval
(df['col']
#.apply(literal_eval) # uncomment if you have strings
.explode()
.str['name']
.groupby(level=0)
.agg(lambda x: list(x.dropna()))
)
Output:
0 []
1 [Bandai Visual, Notes]
2 [Nippon Television Network]
Name: col, dtype: object
Used input:
df = pd.DataFrame({'col': ['[]', "[{'mal_id': 23, 'type': 'anime', 'name': 'Bandai Visual', 'url': 'https://myanimelist.net/anime/producer/23/Bandai_Visual'}, {'mal_id': 703, 'type': 'anime', 'name': 'Notes', 'url': 'https://myanimelist.net/anime/producer/703/Notes'}]", "[{'mal_id': 1003, 'type': 'anime', 'name': 'Nippon Television Network', 'url': 'https://myanimelist.net/anime/producer/1003/Nippon_Television_Network'}]"]})
Another approach using a list comprehension:
df['col'] = [[d['name'] for d in l
if isinstance(d, dict)
and 'name' in d]
for l in df['col']]
Extract the needed attribute from a list of dicts:
df['col'].apply(lambda x: [d['name'] for d in x])
0 []
1 [Bandai Visual, Notes]
2 [Nippon Television Network]