Pandas check time series continuity
Question:
I have a DataFrame with monthly index. I want to examine whether the time index is continuous on the monthly frequency, and, if possible, spots where it becomes discontinuous e.g. has certain “gap months” between two months that are adjacent in its index.
Example: the following time series data
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
misses 1964/10, 1965/[1,2,7].
Answers:
Assuming a dataframe as in your input (first columns are dates), you could do the following:
all = pd.Series(data=pd.date_range(start=df[0].min(), end=df[0].max(), freq='M'))
mask = all.isin(df[0].values)
print(all[~mask])
Output
3 1964-10-31
6 1965-01-31
7 1965-02-28
12 1965-07-31
dtype: datetime64[ns]
The idea is to create a date range with monthly frequency starting from the first date until the last date, and then check those values against your first column.
Use asfreq
by month for add missing datetimes, filter it to new Series
and if necessary grouping by years with create list of months:
s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-31
1 1965-01-31
2 1965-02-28
3 1965-07-31
Name: 0, dtype: datetime64[ns]
out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964 [10]
1965 [1, 2, 7]
Name: 0, dtype: object
Setup:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-30 00:00:00'): 101.21,
pd.Timestamp('1964-11-30 00:00:00'): 101.42,
pd.Timestamp('1964-12-31 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
EDIT:
If datetimes are not always last day of months:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-01 00:00:00'): 101.21,
pd.Timestamp('1964-11-02 00:00:00'): 101.42,
pd.Timestamp('1964-12-05 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-01 101.21
1964-11-02 101.42
1964-12-05 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-01
1 1965-01-01
2 1965-02-01
3 1965-07-01
dtype: datetime64[ns]
I often do that by calculating the gap between each index value.
times_gaps = df.index - df.index.shift(1)
Then you can plot those:
times_gaps.plot()
If there are gaps, you will quicky see where.
If there are no gap, you will see a straight horizontale line.
You can also select gaps times doing:
times_gaps[times_gaps> threshold]
import pandas as pd
# Create a sample time-series data
dates = pd.date_range('2022-01-01', periods=12, freq='M')
data = range(12)
df = pd.DataFrame({'date': dates, 'value': data})
# Check if the time-series is continuous for every month
df_monthly = df.set_index('date').resample('M').mean()
if df_monthly.isnull().sum().sum() == 0:
print("The time-series is continuous for every hour.")
else:
print("The time-series is NOT continuous for every hour.")
I have a DataFrame with monthly index. I want to examine whether the time index is continuous on the monthly frequency, and, if possible, spots where it becomes discontinuous e.g. has certain “gap months” between two months that are adjacent in its index.
Example: the following time series data
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
misses 1964/10, 1965/[1,2,7].
Assuming a dataframe as in your input (first columns are dates), you could do the following:
all = pd.Series(data=pd.date_range(start=df[0].min(), end=df[0].max(), freq='M'))
mask = all.isin(df[0].values)
print(all[~mask])
Output
3 1964-10-31
6 1965-01-31
7 1965-02-28
12 1965-07-31
dtype: datetime64[ns]
The idea is to create a date range with monthly frequency starting from the first date until the last date, and then check those values against your first column.
Use asfreq
by month for add missing datetimes, filter it to new Series
and if necessary grouping by years with create list of months:
s = s.asfreq('m')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-31
1 1965-01-31
2 1965-02-28
3 1965-07-31
Name: 0, dtype: datetime64[ns]
out = s1.dt.month.groupby(s1.dt.year).apply(list)
print (out)
0
1964 [10]
1965 [1, 2, 7]
Name: 0, dtype: object
Setup:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-30 00:00:00'): 101.21,
pd.Timestamp('1964-11-30 00:00:00'): 101.42,
pd.Timestamp('1964-12-31 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-30 101.21
1964-11-30 101.42
1964-12-31 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
EDIT:
If datetimes are not always last day of months:
s = pd.Series({pd.Timestamp('1964-07-31 00:00:00'): 100.0,
pd.Timestamp('1964-08-31 00:00:00'): 98.81,
pd.Timestamp('1964-09-01 00:00:00'): 101.21,
pd.Timestamp('1964-11-02 00:00:00'): 101.42,
pd.Timestamp('1964-12-05 00:00:00'): 101.45,
pd.Timestamp('1965-03-31 00:00:00'): 91.49,
pd.Timestamp('1965-04-30 00:00:00'): 90.33,
pd.Timestamp('1965-05-31 00:00:00'): 85.23,
pd.Timestamp('1965-06-30 00:00:00'): 86.1,
pd.Timestamp('1965-08-31 00:00:00'): 84.26})
print (s)
1964-07-31 100.00
1964-08-31 98.81
1964-09-01 101.21
1964-11-02 101.42
1964-12-05 101.45
1965-03-31 91.49
1965-04-30 90.33
1965-05-31 85.23
1965-06-30 86.10
1965-08-31 84.26
dtype: float64
#convert all months to first day
s.index = s.index.to_period('m').to_timestamp()
#MS is start month frequency
s = s.asfreq('MS')
s1 = pd.Series(s[s.isnull()].index)
print (s1)
0 1964-10-01
1 1965-01-01
2 1965-02-01
3 1965-07-01
dtype: datetime64[ns]
I often do that by calculating the gap between each index value.
times_gaps = df.index - df.index.shift(1)
Then you can plot those:
times_gaps.plot()
If there are gaps, you will quicky see where.
If there are no gap, you will see a straight horizontale line.
You can also select gaps times doing:
times_gaps[times_gaps> threshold]
import pandas as pd
# Create a sample time-series data
dates = pd.date_range('2022-01-01', periods=12, freq='M')
data = range(12)
df = pd.DataFrame({'date': dates, 'value': data})
# Check if the time-series is continuous for every month
df_monthly = df.set_index('date').resample('M').mean()
if df_monthly.isnull().sum().sum() == 0:
print("The time-series is continuous for every hour.")
else:
print("The time-series is NOT continuous for every hour.")