# Find index of longest consectuive timespan in pandas time series

## Question:

I have a time series with gaps (NaN) and I want to find the start and stop index of the longest consecutive sequence where no Nan occurs. I have no clue how to do that.

``````values = [5468.0,
5946.0,
np.nan,
6019.0,
5797.0,
5879.0,
np.nan,
5706.0,
5986.0,
6228.0,
6285.0,
np.nan,
5667.0,
5886.0,
6380.0,
5988.0,
6290.0,
5899.0,
6429.0,
6177.0]

dates = [Timestamp('2018-10-17 13:30:00'),
Timestamp('2018-10-17 14:00:00'),
Timestamp('2018-10-17 14:30:00'),
Timestamp('2018-10-17 15:00:00'),
Timestamp('2018-10-17 15:30:00'),
Timestamp('2018-10-17 16:00:00'),
Timestamp('2018-10-17 16:30:00'),
Timestamp('2018-10-17 17:00:00'),
Timestamp('2018-10-17 17:30:00'),
Timestamp('2018-10-17 18:00:00'),
Timestamp('2018-10-17 18:30:00'),
Timestamp('2018-10-17 19:00:00'),
Timestamp('2018-10-17 19:30:00'),
Timestamp('2018-10-17 20:00:00'),
Timestamp('2018-10-17 20:30:00'),
Timestamp('2018-10-17 21:00:00'),
Timestamp('2018-10-17 21:30:00'),
Timestamp('2018-10-17 22:00:00'),
Timestamp('2018-10-17 22:30:00'),
Timestamp('2018-10-17 23:00:00')]
``````

I found a lot of solutions here on stack, but they all use days and then count with +-1 but in case of my 30 mins frequency this doesn’t work.

I know that I can get True/False with `isnull()` and then `groupby()` or use `dates.diff()[1:]` but I have to less knowledge to find a solution.

You are almost there. Create groups then find the difference between the second timestamp (the first has NaN) and the last of the series:

``````>>> (df.groupby(df['value'].isna().cumsum())['date']
.apply(lambda x: x.iloc[-1] - x.iloc[1])  # 1 to remove NaN
.max())

Timedelta('0 days 03:30:00')
``````

Input dataframe:

``````>>> df
date   value
0  2018-10-17 13:30:00  5468.0  # Group 0
1  2018-10-17 14:00:00  5946.0  # Group 0
2  2018-10-17 14:30:00     NaN  # Group 1
3  2018-10-17 15:00:00  6019.0  # Group 1
4  2018-10-17 15:30:00  5797.0  # Group 1
5  2018-10-17 16:00:00  5879.0  # Group 1
6  2018-10-17 16:30:00     NaN  # Group 2
7  2018-10-17 17:00:00  5706.0  # Group 2
8  2018-10-17 17:30:00  5986.0  # Group 2
9  2018-10-17 18:00:00  6228.0  # Group 2
10 2018-10-17 18:30:00  6285.0  # Group 2
11 2018-10-17 19:00:00     NaN  # Group 3
12 2018-10-17 19:30:00  5667.0  # Group 3
13 2018-10-17 20:00:00  5886.0  # Group 3
14 2018-10-17 20:30:00  6380.0  # Group 3
15 2018-10-17 21:00:00  5988.0  # Group 3
16 2018-10-17 21:30:00  6290.0  # Group 3
17 2018-10-17 22:00:00  5899.0  # Group 3
18 2018-10-17 22:30:00  6429.0  # Group 3
19 2018-10-17 23:00:00  6177.0  # Group 3
``````

Update
To extract the longest series:

``````groups = df['value'].isna().cumsum()
best_group = df.groupby(groups)['date'].apply(lambda x: x.iloc[-1] - x.iloc[1]).idxmax()
out = df[groups == best_group]
print(out)

# Output
date   value
11 2018-10-17 19:00:00     NaN
12 2018-10-17 19:30:00  5667.0
13 2018-10-17 20:00:00  5886.0
14 2018-10-17 20:30:00  6380.0
15 2018-10-17 21:00:00  5988.0
16 2018-10-17 21:30:00  6290.0
17 2018-10-17 22:00:00  5899.0
18 2018-10-17 22:30:00  6429.0
19 2018-10-17 23:00:00  6177.0
``````
Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.