Pandas check time series continuity

Question

I have a DataFrame with monthly index. I want to examine whether the time index is continuous on the monthly frequency, and, if possible, spots where it becomes discontinuous e.g. has certain “gap months” between two months that are adjacent in its index.

Example: the following time series data

1964-07-31    100.00
1964-08-31     98.81
1964-09-30    101.21
1964-11-30    101.42
1964-12-31    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26

misses 1964/10, 1965/[1,2,7].

Asked By: Vim

||

Source

Answer 1

Assuming a dataframe as in your input (first columns are dates), you could do the following:

all = pd.Series(data=pd.date_range(start=df[0].min(), end=df[0].max(), freq='M'))
mask = all.isin(df[0].values)
print(all[~mask])

Output

3    1964-10-31
6    1965-01-31
7    1965-02-28
12   1965-07-31
dtype: datetime64[ns]

The idea is to create a date range with monthly frequency starting from the first date until the last date, and then check those values against your first column.

Answered By: Dani Mesejo

Answer 2

Use asfreq by month for add missing datetimes, filter it to new Series and if necessary grouping by years with create list of months:

s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0   1964-10-31
1   1965-01-31
2   1965-02-28
3   1965-07-31
Name: 0, dtype: datetime64[ns]

out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964         [10]
1965    [1, 2, 7]
Name: 0, dtype: object

Setup:

s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0, 
               pd.Timestamp('1964-08-31 00:00:00'): 98.81, 
               pd.Timestamp('1964-09-30 00:00:00'): 101.21, 
               pd.Timestamp('1964-11-30 00:00:00'): 101.42, 
               pd.Timestamp('1964-12-31 00:00:00'): 101.45,
               pd.Timestamp('1965-03-31 00:00:00'): 91.49, 
               pd.Timestamp('1965-04-30 00:00:00'): 90.33, 
               pd.Timestamp('1965-05-31 00:00:00'): 85.23, 
               pd.Timestamp('1965-06-30 00:00:00'): 86.1, 
               pd.Timestamp('1965-08-31 00:00:00'): 84.26})

print (s)
1964-07-31    100.00
1964-08-31     98.81
1964-09-30    101.21
1964-11-30    101.42
1964-12-31    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26
dtype: float64

EDIT:

If datetimes are not always last day of months:

s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0, 
               pd.Timestamp('1964-08-31 00:00:00'): 98.81, 
               pd.Timestamp('1964-09-01 00:00:00'): 101.21, 
               pd.Timestamp('1964-11-02 00:00:00'): 101.42, 
               pd.Timestamp('1964-12-05 00:00:00'): 101.45,
               pd.Timestamp('1965-03-31 00:00:00'): 91.49, 
               pd.Timestamp('1965-04-30 00:00:00'): 90.33, 
               pd.Timestamp('1965-05-31 00:00:00'): 85.23, 
               pd.Timestamp('1965-06-30 00:00:00'): 86.1, 
               pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31    100.00
1964-08-31     98.81
1964-09-01    101.21
1964-11-02    101.42
1964-12-05    101.45
1965-03-31     91.49
1965-04-30     90.33
1965-05-31     85.23
1965-06-30     86.10
1965-08-31     84.26
dtype: float64

#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0   1964-10-01
1   1965-01-01
2   1965-02-01
3   1965-07-01
dtype: datetime64[ns]

Answered By: jezrael

Answer 3

I often do that by calculating the gap between each index value.

times_gaps = df.index - df.index.shift(1)

Then you can plot those:

times_gaps.plot()

If there are gaps, you will quicky see where.
If there are no gap, you will see a straight horizontale line.

You can also select gaps times doing:

times_gaps[times_gaps> threshold]

Answered By: Ludo Schmidt

Answer 4

import pandas as pd

# Create a sample time-series data
dates = pd.date_range('2022-01-01', periods=12, freq='M')
data = range(12)
df = pd.DataFrame({'date': dates, 'value': data})

# Check if the time-series is continuous for every month
df_monthly = df.set_index('date').resample('M').mean()
if df_monthly.isnull().sum().sum() == 0:
    print("The time-series is continuous for every hour.")
else:
    print("The time-series is NOT continuous for every hour.")

Answered By: hongkail

Pandas check time series continuity

Question:

Answers: