Grouping a dataframe and conserving the same number of rows

Question:

I’m trying to make the kind of transformation shown in the image below :

enter image description here

I made the code below but unfortunately I’m not getting the result I’m looking for:

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})

out = (df['Values']
      .astype(str)
      .groupby(df['Id'])
      .agg('|'.join)
      .reset_index())

Do you have any suggestions/propositions, please ?

Asked By: L'Artiste

||

Answers:

You’re close, you just need to use out to assign the result back to the df (it’s better if you don’t reset_index() in this case):

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Values': ['red', 'brown','white','blue', 'green', 'yellow', 'rose', 'purple']})

out = (df['Values']
      .astype(str)
      .groupby(df['Id'])
      .agg('|'.join))

counts = df['Id'].value_counts()
df['Id_occurrences'] = [counts.loc[id] for id in df['Id']]
df['Values_grouped'] = [out.loc[id] for id in df['Id']]
Answered By: Colin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.