check for any missing dates in the index
Question:
Is there any way to check for missing dates in a dataframe directly.
I want to check if there are a missing dates between 2013-01-19
to 2018-01-29
GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP
Date
2013-01-19 11,826.36 1,068.45 195.00 0.51 1.82
2013-01-20 13,062.68 1,158.71 207.58 0.52 1.75
...
2018-01-28 12,326.23 1,108.90 197.36 0.48 1.55
2018-01-29 11,397.52 1,038.21 184.92 0.47 1.43
I tried to check it manually but it took a lot of time.
Answers:
I can’t post a comment but you can probably traverse each value and add 24 hours to the previous value to see if the date matches?
import pandas as pd
a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]
df = pd.DataFrame({'a':a , 'b': b})
for i in range(len(df)):
prev = df.loc[i,'a']
if i is 0:
continue
else:
# Add 1 day to the current value and check with prev value
Example:
As a minimal example, take this:
>>> df
GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP
Date
2013-01-19 11,826.36 1,068.45 195.00 0.51 1.82
2013-01-20 13,062.68 1,158.71 207.58 0.52 1.75
2013-01-28 12,326.23 1,108.90 197.36 0.48 1.55
2013-01-29 11,397.52 1,038.21 184.92 0.47 1.43
And we can find the missing dates between 2013-01-19
and 2013-01-29
Method 1:
See @Vaishali’s answer
Use .difference
to find the difference between your datetime index and the set of all dates within your range:
pd.date_range('2013-01-19', '2013-01-29').difference(df.index)
Which returns:
DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
'2013-01-25', '2013-01-26', '2013-01-27'],
dtype='datetime64[ns]', freq=None)
Method 2:
You can re-index your dataframe using all dates within your desired daterange, and find where reindex
has inserted NaN
s.
And to find missing dates between 2013-01-19
and 2013-01-29
:
>>> df.reindex(pd.date_range('2013-01-19', '2013-01-29')).isnull().all(1)
2013-01-19 False
2013-01-20 False
2013-01-21 True
2013-01-22 True
2013-01-23 True
2013-01-24 True
2013-01-25 True
2013-01-26 True
2013-01-27 True
2013-01-28 False
2013-01-29 False
Freq: D, dtype: bool
Those values with True
are the missing dates in your original dataframe
assuming data is daily non business dates:
df.index.to_series().diff().dt.days > 1
You can use DatetimeIndex.difference(other)
pd.date_range(start = '2013-01-19', end = '2018-01-29' ).difference(df.index)
It returns the elements not present in the other
You can use DatetimeIndex.difference and add freq param, so you can check for missing days, hours, minutes, depending on the frequency you are using:
pd.date_range(df.index.min(), df.index.max(), freq="1min").difference(df.index)
Is there any way to check for missing dates in a dataframe directly.
I want to check if there are a missing dates between 2013-01-19
to 2018-01-29
GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP
Date
2013-01-19 11,826.36 1,068.45 195.00 0.51 1.82
2013-01-20 13,062.68 1,158.71 207.58 0.52 1.75
...
2018-01-28 12,326.23 1,108.90 197.36 0.48 1.55
2018-01-29 11,397.52 1,038.21 184.92 0.47 1.43
I tried to check it manually but it took a lot of time.
I can’t post a comment but you can probably traverse each value and add 24 hours to the previous value to see if the date matches?
import pandas as pd
a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]
df = pd.DataFrame({'a':a , 'b': b})
for i in range(len(df)):
prev = df.loc[i,'a']
if i is 0:
continue
else:
# Add 1 day to the current value and check with prev value
Example:
As a minimal example, take this:
>>> df
GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP
Date
2013-01-19 11,826.36 1,068.45 195.00 0.51 1.82
2013-01-20 13,062.68 1,158.71 207.58 0.52 1.75
2013-01-28 12,326.23 1,108.90 197.36 0.48 1.55
2013-01-29 11,397.52 1,038.21 184.92 0.47 1.43
And we can find the missing dates between 2013-01-19
and 2013-01-29
Method 1:
See @Vaishali’s answer
Use .difference
to find the difference between your datetime index and the set of all dates within your range:
pd.date_range('2013-01-19', '2013-01-29').difference(df.index)
Which returns:
DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
'2013-01-25', '2013-01-26', '2013-01-27'],
dtype='datetime64[ns]', freq=None)
Method 2:
You can re-index your dataframe using all dates within your desired daterange, and find where reindex
has inserted NaN
s.
And to find missing dates between 2013-01-19
and 2013-01-29
:
>>> df.reindex(pd.date_range('2013-01-19', '2013-01-29')).isnull().all(1)
2013-01-19 False
2013-01-20 False
2013-01-21 True
2013-01-22 True
2013-01-23 True
2013-01-24 True
2013-01-25 True
2013-01-26 True
2013-01-27 True
2013-01-28 False
2013-01-29 False
Freq: D, dtype: bool
Those values with True
are the missing dates in your original dataframe
assuming data is daily non business dates:
df.index.to_series().diff().dt.days > 1
You can use DatetimeIndex.difference(other)
pd.date_range(start = '2013-01-19', end = '2018-01-29' ).difference(df.index)
It returns the elements not present in the other
You can use DatetimeIndex.difference and add freq param, so you can check for missing days, hours, minutes, depending on the frequency you are using:
pd.date_range(df.index.min(), df.index.max(), freq="1min").difference(df.index)