Assign a unique value to consecutive null values untill a non value

Question:

I want to apply a function that does the cumulative count of null values.

The closest solution I came to was this:

import pandas as pd
import numpy as np

# create the column
col = pd.Series([1, 2, np.nan, np.nan, 3, 4, np.nan, np.nan, 5])

col.isnull().cumsum()

But the output is not the way I want:

0    0
1    0
2    1
3    2
4    2
5    2
6    3
7    4
8    4
dtype: int32

I want the output to be the following: [0, 0, 1, 1, 1, 1, 2, 2, 2].

How do I achieve this?

Asked By: Joe

||

Answers:

You seem to want to count only the first NA per stretch:

m = col.isna()
out = (m & ~m.shift(fill_value=False)).cumsum()

Shortcut:

m = col.isna()
out = (m & m.diff()).cumsum()

Output:

0    0
1    0
2    1
3    1
4    1
5    1
6    2
7    2
8    2
dtype: int64

Intermediates:

   col      m  ~m.shift(fill_value=False)      &  cumsum
0  1.0  False                        True  False       0
1  2.0  False                        True  False       0
2  NaN   True                        True   True       1
3  NaN   True                       False  False       1
4  3.0  False                       False  False       1
5  4.0  False                        True  False       1
6  NaN   True                        True   True       2
7  NaN   True                       False  False       2
8  5.0  False                       False  False       2

Variant:

out = col.isna().astype(int).diff().eq(1).cumsum()
Answered By: mozway

You can use:

# Increment when the previous row is not n/a AND the current row is n/a
out = (col.shift().notna() & col.isna()).cumsum()
print(out)

# Output
0    0
1    0
2    1
3    1
4    1
5    1
6    2
7    2
8    2
dtype: int64
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.