Pandas: Forward fill missing value by each group of two columns

Question:

I have data with group level as [‘color’, ‘fruit’, ‘date’, ‘value’].

data = pd.DataFrame({'color': ['Green','Green', 'Green', 'Green', 'Red', 'Red'], 
                    'fruit' : ['Banana', 'Banana', 'Apple', 'Apple', 'Banana', 'Apple'],
                    'date': ['2011-01-01', '2011-01-02', '2011-01-01', '2011-01-02', '2011-02-01', '2011-02-01'],
                    'value': [ 1, np.nan, np.nan, 2, 3 , np.nan]})


Output:


Class   fruit   date    value
0   Green   Banana  2011-01-01  1.0
1   Green   Banana  2011-01-02  NaN
2   Green   Apple   2011-01-01  NaN
3   Green   Apple   2011-01-02  2.0
4   Yellow  Banana  2011-02-01  3.0
5   Yellow  Apple   2011-02-01  NaN

I need to fill down for ‘value’ where for a date we have no data. So this fill down would only be limited to [‘color’, ‘fruit’] level.

I am trying to fill down with
df = df.groupby(['color', 'fruit', 'date'])['value'].mean().replace(to_replace=0, method='ffill')
but this spills the data over to next associated group of [color, fruit]

Expected Output:


Class   fruit   date    value
0   Green   Banana  2011-01-01  1.0
1   Green   Banana  2011-01-02  1.0
2   Green   Apple   2011-01-01  NaN
3   Green   Apple   2011-01-02  2.0
4   Yellow  Banana  2011-02-01  3.0
5   Yellow  Apple   2011-02-01  NaN
Asked By: Stan

||

Answers:

You can use GroupBy.cumcount with pandas.Series.ffill :

m = data.groupby(["color", "fruit"]).cumcount().astype(bool)

data["value"] = data["value"].ffill().where(m, data["value"])

Or as mentionned by @Mustafa Aydin, simply use GroupBy.ffill :

data["value"] = data.groupby(["color", "fruit"])["value"].ffill()

Output :

print(data)

   color   fruit        date  value
0  Green  Banana  2011-01-01    1.0
1  Green  Banana  2011-01-02    1.0
2  Green   Apple  2011-01-01    NaN
3  Green   Apple  2011-01-02    2.0
4    Red  Banana  2011-02-01    3.0
5    Red   Apple  2011-02-01    NaN
Answered By: Timeless
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.