Get the groupby column back in pandas
Question:
I am doing a groupby on column a
followed by ffill
, but after groupby the column a
is gone. The result df will have only column b
and c
. Is there a way to get the column a
back after groupby and ffill? I am assuming that the values will shuffle in the process.
How to get back the groupby column in pandas?
df = pd.DataFrame({'a':[1,1,2,2] ,
'b': [12,np.nan,14, 13],
'c' : [1, 2, np.nan, np.nan]
})
df
df.groupby('a').ffill()
Answers:
This will work…
df.groupby('a').ffill().join(df.a)[['a', 'b', 'c']]
I’m not sure why the column disappears when groupby
returns a like-indexed (transform) result. For example, cumsum
has the same issue. I thought it might be related to the group_keys
argument of groupby
, but I didn’t have any luck when setting that to True
.
import pandas as pd
df = pd.DataFrame({'a':[1,1,2,2] ,
'b': [12,np.nan,14, 13],
'c' : [1, 2, np.nan, np.nan]
})
df[['b', 'c']] = df.groupby('a').ffill()
print(df)
a b c
0 1 12.0 1.0
1 1 12.0 2.0
2 2 14.0 NaN
3 2 13.0 NaN
You can force to have all columns with:
df.groupby('a')[list(df)].ffill()
# or
df.groupby('a')[df.columns].ffill()
# or
df.groupby(df['a'].to_numpy()).ffill()
Output:
a b c
0 1 12.0 1.0
1 1 12.0 2.0
2 2 14.0 NaN
3 2 13.0 NaN
I am doing a groupby on column a
followed by ffill
, but after groupby the column a
is gone. The result df will have only column b
and c
. Is there a way to get the column a
back after groupby and ffill? I am assuming that the values will shuffle in the process.
How to get back the groupby column in pandas?
df = pd.DataFrame({'a':[1,1,2,2] ,
'b': [12,np.nan,14, 13],
'c' : [1, 2, np.nan, np.nan]
})
df
df.groupby('a').ffill()
This will work…
df.groupby('a').ffill().join(df.a)[['a', 'b', 'c']]
I’m not sure why the column disappears when groupby
returns a like-indexed (transform) result. For example, cumsum
has the same issue. I thought it might be related to the group_keys
argument of groupby
, but I didn’t have any luck when setting that to True
.
import pandas as pd
df = pd.DataFrame({'a':[1,1,2,2] ,
'b': [12,np.nan,14, 13],
'c' : [1, 2, np.nan, np.nan]
})
df[['b', 'c']] = df.groupby('a').ffill()
print(df)
a b c
0 1 12.0 1.0
1 1 12.0 2.0
2 2 14.0 NaN
3 2 13.0 NaN
You can force to have all columns with:
df.groupby('a')[list(df)].ffill()
# or
df.groupby('a')[df.columns].ffill()
# or
df.groupby(df['a'].to_numpy()).ffill()
Output:
a b c
0 1 12.0 1.0
1 1 12.0 2.0
2 2 14.0 NaN
3 2 13.0 NaN