How to separate the groupedby data in diferent columns of a dataframe?

Question:

How do i turn this

object Name Color
Fruit Banana Yellow
Fruit Apple Red
Fruit Melon Green
Car Fiat White
Car BMW Black
Car NaN NaN

In to this?

object Name1 Name2 Name3 Color1 Color2 Color3
Fruit Banana Apple Melon Yellow Red Green
Car Fiat BMW NaN White Black NaN

I’ve searched the pandas documentation, but couldn’t find a solution to this

Read the pandas documentation, tried some diferent methods of groupby

Asked By: Gabriel Makhoul

||

Answers:

Feels inefficient, but you can first create a new column to keep track of the number of times each item is listed before melting, creating the new column names, then pivoting back.

import pandas as pd
import numpy as np

#original df
df = pd.DataFrame({
    'object': ['Fruit', 'Fruit', 'Fruit', 'Car', 'Car', 'Car'],
    'Name': ['Banana', 'Apple', 'Melon', 'Fiat', 'BMW', np.nan],
    'Color': ['Yellow', 'Red', 'Green', 'White', 'Black', np.nan],
})

#add an 'object_count' column to df
df['object_count'] = df.groupby('object').cumcount().add(1)

#melt df to long form
long_df = df.melt(id_vars=['object','object_count'])

#append 'object_count' to the variable column
long_df['variable'] += long_df['object_count'].astype(str)

#pivot the table back to wide form
final_df = long_df.pivot(
    index='object',
    columns='variable',
    values='value',
).reset_index()

final_df.columns.name = None #get rid of the 'variable' text at the top right of the table

#note, the output table isn't sorted by row or col the same as your expected output
#(it's sorted alphabetically for both)
#but you can do this or find help if it's important

print(final_df)

Output

  object  Color1 Color2 Color3   Name1  Name2  Name3
0    Car   White  Black    NaN    Fiat    BMW    NaN
1  Fruit  Yellow    Red  Green  Banana  Apple  Melon
Answered By: mitoRibo

With inspiration from comment by @mitoRibo, here is an answer:

df["N"] = df.assign(N=1).groupby("object")["N"].cumsum().map("Name{}".format)
df["C"] = df.assign(C=1).groupby("object")["C"].cumsum().map("Color{}".format)
out = df.pivot(index=["object"], columns=["N", "C"], values=["Name", "Color"])
out.columns = [t[1] if t[0] == "Name" else t[2] for t in out.columns]
print(out)

         Name1  Name2  Name3  Color1 Color2 Color3
object                                            
Car       Fiat    BMW    NaN   White  Black    NaN
Fruit   Banana  Apple  Melon  Yellow    Red  Green
Answered By: SomeDude
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.