How to apply multiple operations on multiple columns based on a single column in pandas?

Question:

I have a sample dataframe that looks like this:

     primaryName    averageRating                 primaryProfession    knownForTitles runtimeMinutes
1   Fred Astaire            7.0      soundtrack,actor,miscellaneous      tt0072308            165
2   Fred Astaire            6.9      soundtrack,actor,miscellaneous      tt0031983             93
3   Fred Astaire            7.0      soundtrack,actor,miscellaneous      tt0050419            103
4   Fred Astaire            7.1      soundtrack,actor,miscellaneous      tt0053137            134

So basically i want to take the average of averageRating column, extract "actor/actress" from primaryProfession column, count of knownForTitles column and sum of runtimeMinutes column based on primaryName column.
The output dataframe should look like this:

     primaryName    averageRating      primaryProfession    knownForTitles   runtimeMinutes
1   Fred Astaire            28                    actor            4            495

Any ideas how i can achieve this? Thanks in advance for the help.

Asked By: astroluv

||

Answers:

Try this:

df.loc[df['primaryProfession'].str.contains('actor'), 'primaryProfession'] = 'actor'
df.loc[df['primaryProfession'].str.contains('actress'), 'primaryProfession'] = 'actress'

df.groupby(['primaryName', 'primaryProfession'], as_index=False).agg({'averageRating':'mean', 'knownForTitles':'count', 'runtimeMinutes':'sum'})
Answered By: Costas
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.