How to separate the groupedby data in diferent columns of a dataframe?
Question:
How do i turn this
object
Name
Color
Fruit
Banana
Yellow
Fruit
Apple
Red
Fruit
Melon
Green
Car
Fiat
White
Car
BMW
Black
Car
NaN
NaN
In to this?
object
Name1
Name2
Name3
Color1
Color2
Color3
Fruit
Banana
Apple
Melon
Yellow
Red
Green
Car
Fiat
BMW
NaN
White
Black
NaN
I’ve searched the pandas documentation, but couldn’t find a solution to this
Read the pandas documentation, tried some diferent methods of groupby
Answers:
Feels inefficient, but you can first create a new column to keep track of the number of times each item is listed before melt
ing, creating the new column names, then pivot
ing back.
import pandas as pd
import numpy as np
#original df
df = pd.DataFrame({
'object': ['Fruit', 'Fruit', 'Fruit', 'Car', 'Car', 'Car'],
'Name': ['Banana', 'Apple', 'Melon', 'Fiat', 'BMW', np.nan],
'Color': ['Yellow', 'Red', 'Green', 'White', 'Black', np.nan],
})
#add an 'object_count' column to df
df['object_count'] = df.groupby('object').cumcount().add(1)
#melt df to long form
long_df = df.melt(id_vars=['object','object_count'])
#append 'object_count' to the variable column
long_df['variable'] += long_df['object_count'].astype(str)
#pivot the table back to wide form
final_df = long_df.pivot(
index='object',
columns='variable',
values='value',
).reset_index()
final_df.columns.name = None #get rid of the 'variable' text at the top right of the table
#note, the output table isn't sorted by row or col the same as your expected output
#(it's sorted alphabetically for both)
#but you can do this or find help if it's important
print(final_df)
Output
object Color1 Color2 Color3 Name1 Name2 Name3
0 Car White Black NaN Fiat BMW NaN
1 Fruit Yellow Red Green Banana Apple Melon
With inspiration from comment by @mitoRibo, here is an answer:
df["N"] = df.assign(N=1).groupby("object")["N"].cumsum().map("Name{}".format)
df["C"] = df.assign(C=1).groupby("object")["C"].cumsum().map("Color{}".format)
out = df.pivot(index=["object"], columns=["N", "C"], values=["Name", "Color"])
out.columns = [t[1] if t[0] == "Name" else t[2] for t in out.columns]
print(out)
Name1 Name2 Name3 Color1 Color2 Color3
object
Car Fiat BMW NaN White Black NaN
Fruit Banana Apple Melon Yellow Red Green
How do i turn this
object | Name | Color |
---|---|---|
Fruit | Banana | Yellow |
Fruit | Apple | Red |
Fruit | Melon | Green |
Car | Fiat | White |
Car | BMW | Black |
Car | NaN | NaN |
In to this?
object | Name1 | Name2 | Name3 | Color1 | Color2 | Color3 |
---|---|---|---|---|---|---|
Fruit | Banana | Apple | Melon | Yellow | Red | Green |
Car | Fiat | BMW | NaN | White | Black | NaN |
I’ve searched the pandas documentation, but couldn’t find a solution to this
Read the pandas documentation, tried some diferent methods of groupby
Feels inefficient, but you can first create a new column to keep track of the number of times each item is listed before melt
ing, creating the new column names, then pivot
ing back.
import pandas as pd
import numpy as np
#original df
df = pd.DataFrame({
'object': ['Fruit', 'Fruit', 'Fruit', 'Car', 'Car', 'Car'],
'Name': ['Banana', 'Apple', 'Melon', 'Fiat', 'BMW', np.nan],
'Color': ['Yellow', 'Red', 'Green', 'White', 'Black', np.nan],
})
#add an 'object_count' column to df
df['object_count'] = df.groupby('object').cumcount().add(1)
#melt df to long form
long_df = df.melt(id_vars=['object','object_count'])
#append 'object_count' to the variable column
long_df['variable'] += long_df['object_count'].astype(str)
#pivot the table back to wide form
final_df = long_df.pivot(
index='object',
columns='variable',
values='value',
).reset_index()
final_df.columns.name = None #get rid of the 'variable' text at the top right of the table
#note, the output table isn't sorted by row or col the same as your expected output
#(it's sorted alphabetically for both)
#but you can do this or find help if it's important
print(final_df)
Output
object Color1 Color2 Color3 Name1 Name2 Name3
0 Car White Black NaN Fiat BMW NaN
1 Fruit Yellow Red Green Banana Apple Melon
With inspiration from comment by @mitoRibo, here is an answer:
df["N"] = df.assign(N=1).groupby("object")["N"].cumsum().map("Name{}".format)
df["C"] = df.assign(C=1).groupby("object")["C"].cumsum().map("Color{}".format)
out = df.pivot(index=["object"], columns=["N", "C"], values=["Name", "Color"])
out.columns = [t[1] if t[0] == "Name" else t[2] for t in out.columns]
print(out)
Name1 Name2 Name3 Color1 Color2 Color3
object
Car Fiat BMW NaN White Black NaN
Fruit Banana Apple Melon Yellow Red Green