Get the groupby column back in pandas

Question:

I am doing a groupby on column a followed by ffill, but after groupby the column a is gone. The result df will have only column b and c. Is there a way to get the column a back after groupby and ffill? I am assuming that the values will shuffle in the process.

How to get back the groupby column in pandas?

df = pd.DataFrame({'a':[1,1,2,2] ,  
                   'b': [12,np.nan,14, 13], 
                   'c' : [1, 2, np.nan, np.nan]
                 })
df

df.groupby('a').ffill()
Asked By: learner

||

Answers:

This will work

df.groupby('a').ffill().join(df.a)[['a', 'b', 'c']]

I’m not sure why the column disappears when groupby returns a like-indexed (transform) result. For example, cumsum has the same issue. I thought it might be related to the group_keys argument of groupby, but I didn’t have any luck when setting that to True.

Answered By: Attila the Fun
import pandas as pd

df = pd.DataFrame({'a':[1,1,2,2] ,  
                   'b': [12,np.nan,14, 13], 
                   'c' : [1, 2, np.nan, np.nan]
                 })

df[['b', 'c']] = df.groupby('a').ffill()

print(df)
   a     b    c
0  1  12.0  1.0
1  1  12.0  2.0
2  2  14.0  NaN
3  2  13.0  NaN
Answered By: Laurent B.

You can force to have all columns with:

df.groupby('a')[list(df)].ffill()

# or
df.groupby('a')[df.columns].ffill()

# or
df.groupby(df['a'].to_numpy()).ffill()

Output:

   a     b    c
0  1  12.0  1.0
1  1  12.0  2.0
2  2  14.0  NaN
3  2  13.0  NaN
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.