How to keep a cumulative count of changes across row elements, ignoring NaNs, and creating a separate column with the results

Question:

I have a data frame that looks like this:

Identification Date (day/month/year) X Y
123 01/01/2022 NaN abc
123 02/01/2022 200 acb
123 03/01/2022 200 ary
124 01/01/2022 200 abc
124 02/01/2022 NaN abc
124 03/01/2022 NaN NaN

I am trying to create two separate ‘change’ columns, one for x and y separately, that is keeping a rolling count of how many times a given element is changing over time. I would like my output to look something like this, where NaN —> NaN is not counted as a change but NaN —> some element is counted:

Identification Date (day/month/year) X Y Change X Change Y
123 01/01/2022 NaN abc 0 0
123 02/01/2022 200 acb 1 1
123 03/01/2022 200 ary 1 2
124 01/01/2022 200 abc 0 0
124 02/01/2022 NaN abc 1 0
124 03/01/2022 NaN NaN 1 1

Thanks 🙂

Asked By: Mitchell

||

Answers:

You can use a classical comparison with the next item (obtained with groupby.shift) combined with a groupby.cumsum, however a NaN compared with another NaN yields False. To overcome this, we can first fillna with an object that is not part of the dataset. Here I chose object, it could be -1 if your data is strictly positive.

def change(s):
    s = s.fillna(object)
    return (s.ne(s.groupby(df['Identification']).shift())
            .groupby(df['Identification']).cumsum().sub(1)
            )

out = df.join(df[['X', 'Y']].apply(change).add_prefix('Change '))

print(out)

Output:

   Identification Date (day/month/year)      X    Y  Change X  Change Y
0             123            01/01/2022    NaN  abc         0         0
1             123            02/01/2022  200.0  acb         1         1
2             123            03/01/2022  200.0  ary         1         2
3             124            01/01/2022  200.0  abc         0         0
4             124            02/01/2022    NaN  abc         1         0
5             124            03/01/2022    NaN  NaN         1         1
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.