Special case of counting empty cells "before" an occupied cell in Pandas
Question:
Pandas question here.
I have a specific dataset in which we are sampling subjective ratings several times over a second. The information is sorted as below. What I need is a way to "count" the number of blank cells before every "second" (i.e. "1" in the second’s column that occur at regular intervals), so I can feed that value into a greatest common factor equation and create somewhat of a linear extrapolation based on milliseconds. In the example below that number would be "2" and I would feed that into the GCF formula. The end goal is to make a more accurate/usable timestamp. Sampling rates may vary by dataset.
index
rating
seconds
1
26
2
28
3
30
1
4
33
5
40
6
45
1
7
50
8
48
9
49
1
Answers:
If you just want to count the number of NaNs before the first 1
:
df['seconds'].isna().cummin().sum()
If you have another value (e.g. empty string)
df['seconds'].eq('').cummin().sum()
Output: 2
Or, if you have a range Index:
df['seconds'].first_valid_index()
Pandas question here.
I have a specific dataset in which we are sampling subjective ratings several times over a second. The information is sorted as below. What I need is a way to "count" the number of blank cells before every "second" (i.e. "1" in the second’s column that occur at regular intervals), so I can feed that value into a greatest common factor equation and create somewhat of a linear extrapolation based on milliseconds. In the example below that number would be "2" and I would feed that into the GCF formula. The end goal is to make a more accurate/usable timestamp. Sampling rates may vary by dataset.
index | rating | seconds |
---|---|---|
1 | 26 | |
2 | 28 | |
3 | 30 | 1 |
4 | 33 | |
5 | 40 | |
6 | 45 | 1 |
7 | 50 | |
8 | 48 | |
9 | 49 | 1 |
If you just want to count the number of NaNs before the first 1
:
df['seconds'].isna().cummin().sum()
If you have another value (e.g. empty string)
df['seconds'].eq('').cummin().sum()
Output: 2
Or, if you have a range Index:
df['seconds'].first_valid_index()