Pandas – Merging rows based on certain columns and combine certain columns
Question:
I have a dataframe similar to:
State Organization Date Tag
MD ABC 01/10/2021 901
MD ABC 01/10/2021 801
NJ DEF 02/10/2021 701
NJ DEF 02/10/2021 601
NJ DEF 02/10/2021 701
I want to combine all rows where the state, organization, and date are the same. However, I want to take the tag column for each and make a list out of all the tags from the original rows in the new merged rows. So like:
State Organization Date Tag
MD ABC 01/10/2021 901, 801
NJ DEF 02/10/2021 701, 601, 701
I’m thinking there definitely has to be an easy way to do this since as of now I’m doing a lot of work to achieve that using iterrows along with some other stuff. Suggestions?
Answers:
Try this:
df.astype({'Tag':'str'}).groupby(['State','Organization','Date'])['Tag'].agg(', '.join).reset_index()
Old Answer:
df.groupby(['State','Organization']).agg({'Date':'first','Tag':lambda x: ','.join(x.astype(str))})
Thanks to rhug123, with a slight modification I get the desired effect:
df.groupby(['State','Organization', 'Date']).agg({'Tag':lambda x: ','.join(x.astype(str))})
I have a dataframe similar to:
State Organization Date Tag
MD ABC 01/10/2021 901
MD ABC 01/10/2021 801
NJ DEF 02/10/2021 701
NJ DEF 02/10/2021 601
NJ DEF 02/10/2021 701
I want to combine all rows where the state, organization, and date are the same. However, I want to take the tag column for each and make a list out of all the tags from the original rows in the new merged rows. So like:
State Organization Date Tag
MD ABC 01/10/2021 901, 801
NJ DEF 02/10/2021 701, 601, 701
I’m thinking there definitely has to be an easy way to do this since as of now I’m doing a lot of work to achieve that using iterrows along with some other stuff. Suggestions?
Try this:
df.astype({'Tag':'str'}).groupby(['State','Organization','Date'])['Tag'].agg(', '.join).reset_index()
Old Answer:
df.groupby(['State','Organization']).agg({'Date':'first','Tag':lambda x: ','.join(x.astype(str))})
Thanks to rhug123, with a slight modification I get the desired effect:
df.groupby(['State','Organization', 'Date']).agg({'Tag':lambda x: ','.join(x.astype(str))})