Modify rows between two flags (values) in dataframe columns

Question:

I want to create a new dataframe with the same shape based on two existing dataframes. I have one dataframe that represents the flags and another one with the values I want to replace.

The flag dataframe has only 1, -1 and NaNs, and always after a 1 I’ll have a -1. So basically its a "changing state" kind of dataframe. What I want to do is: in between the interval of 1 and -1, I need to fill in the average of the same interval in the second dataframe, PR.

flag = pd.DataFrame({'col1': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                   'col2': [np.nan,1,-1,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,-1],
                   'col3': [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,-1],
                   'col4': [np.nan,np.nan,np.nan,np.nan,np.nan,1,np.nan,-1,np.nan,np.nan,np.nan]
                   })

PR = pd.DataFrame({'col1': [81,81.3,80.7,81.5,81,80.4,80.3,81,79.5,80.7],
                   'col2': [80.9,81.6,81.2,81.7,80.9,79.7,79.3,79.1,79,77.5],
                   'col3': [81.1,81.3,81,81.6,80.8,79.5,79.2,78.8,78.8,77.4],
                   'col4': [80.1,80.6,79.9,80.4,80.4,79.3,79,78.8,78.4,77]
                   })

This would have to give me:

   col1   col2   col3   col4
0   NaN    NaN    NaN    NaN
1   NaN  81.40    NaN    NaN
2   NaN  81.40    NaN    NaN
3   NaN    NaN  79.44    NaN
4   NaN  79.25  79.44    NaN
5   NaN  79.25  79.44  79.03
6   NaN  79.25  79.44  79.03
7   NaN  79.25  79.44  79.03
8   NaN  79.25  79.44    NaN
9   NaN  79.25  79.44    NaN

Any help is much appreciated!

Asked By: Pedro de Sá

||

Answers:

I would use a custom function:

def process(s, ref=flag):
    f = ref[s.name] # get matching flag

    # create group and mask data outside of 1 -> -1
    m = (f.map({1: True, -1: False}).ffill()
         | f.eq(-1)
        )
    group = f.eq(1).cumsum().where(m)

    # transform to mean
    return s.groupby(group).transform('mean') 

out = PR.apply(process, ref=flag).round(2)

Output:

   col1   col2   col3   col4
0   NaN    NaN    NaN    NaN
1   NaN  81.40    NaN    NaN
2   NaN  81.40    NaN    NaN
3   NaN    NaN  79.44    NaN
4   NaN  79.25  79.44    NaN
5   NaN  79.25  79.44  79.03
6   NaN  79.25  79.44  79.03
7   NaN  79.25  79.44  79.03
8   NaN  79.25  79.44    NaN
9   NaN  79.25  79.44    NaN
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.