How to only return consecutive values in a pandas dataframe

Question:

I have a modified a dataframe to only include the line if its over a certain value. Now in this new dataframe, I want to modify it further to only include the lines that are consecutive. For example, if this is the dataframe with the index and then Timestamp and Value headers:

                       Timestamp     Value
316 2022-03-01T16:28:50.4580078Z    36.000
344 2022-03-01T17:26:25.8170166Z    37.500
345 2022-03-01T17:26:45.8480072Z    41.000
346 2022-03-01T17:27:05.8800048Z    35.250
374 2022-03-01T18:23:10.9630126Z    36.000
387 2022-03-01T18:36:12.1380004Z    42.000
388 2022-03-01T18:36:32.1690063Z    37.000
391 2022-03-01T18:37:52.2950134Z    34.500

this should return a new dataframe with only 344,345,346,387,and 388 because these are the only lines that have a consecutive neighbor in the index

                       Timestamp     Value
344 2022-03-01T17:26:25.8170166Z    37.500
345 2022-03-01T17:26:45.8480072Z    41.000
346 2022-03-01T17:27:05.8800048Z    35.250
387 2022-03-01T18:36:12.1380004Z    42.000
388 2022-03-01T18:36:32.1690063Z    37.000

I have tried iterating and appending to a new list, using df.iloc[i] and df.iloc[i-1] and still cant seem to get it working properly.

Asked By: Bash

||

Answers:

based on the time

You can compute the successive time differences and compare them to a threshold (here 1 minute). Keep the rows for which the mask is True (or that of the row below):

thresh = '1min'

m = (pd.to_datetime(df['Timestamp'])
       .diff().lt(thresh)
    )

out = df[m|m.shift(-1)]

based on the index

Same logic but checking a successive difference of 1 in the index.

m = df.index.to_series().diff().eq(1)

out = df[m|m.shift(-1)]

Output:


                              Timestamp  Value
344 2022-03-01 17:26:25.817016600+00:00  37.50
345 2022-03-01 17:26:45.848007200+00:00  41.00
346 2022-03-01 17:27:05.880004800+00:00  35.25
387 2022-03-01 18:36:12.138000400+00:00  42.00
388 2022-03-01 18:36:32.169006300+00:00  37.00

Intermediates:

                              Timestamp  Value      m m.shift(-1)
316 2022-03-01 16:28:50.458007800+00:00  36.00  False       False
344 2022-03-01 17:26:25.817016600+00:00  37.50  False        True
345 2022-03-01 17:26:45.848007200+00:00  41.00   True        True
346 2022-03-01 17:27:05.880004800+00:00  35.25   True       False
374 2022-03-01 18:23:10.963012600+00:00  36.00  False       False
387 2022-03-01 18:36:12.138000400+00:00  42.00  False        True
388 2022-03-01 18:36:32.169006300+00:00  37.00   True       False
391 2022-03-01 18:37:52.295013400+00:00  34.50  False         NaN
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.