Pandas – Merging rows based on certain columns and combine certain columns

Question:

I have a dataframe similar to:

State Organization Date    Tag
MD    ABC      01/10/2021  901
MD    ABC      01/10/2021  801
NJ    DEF      02/10/2021  701
NJ    DEF      02/10/2021  601
NJ    DEF      02/10/2021  701

I want to combine all rows where the state, organization, and date are the same. However, I want to take the tag column for each and make a list out of all the tags from the original rows in the new merged rows. So like:

State Organization Date    Tag
MD    ABC      01/10/2021  901, 801
NJ    DEF      02/10/2021  701, 601, 701

I’m thinking there definitely has to be an easy way to do this since as of now I’m doing a lot of work to achieve that using iterrows along with some other stuff. Suggestions?

Asked By: Frederick Almond

||

Answers:

Try this:

df.astype({'Tag':'str'}).groupby(['State','Organization','Date'])['Tag'].agg(', '.join).reset_index()

Old Answer:

df.groupby(['State','Organization']).agg({'Date':'first','Tag':lambda x: ','.join(x.astype(str))})
Answered By: rhug123

Thanks to rhug123, with a slight modification I get the desired effect:

df.groupby(['State','Organization', 'Date']).agg({'Tag':lambda x: ','.join(x.astype(str))})
Answered By: Frederick Almond
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.