pandas.DatetimeIndex frequency is None and can't be set

Question:

I created a DatetimeIndex from a “date” column:

sales.index = pd.DatetimeIndex(sales["date"])

Now the index looks as follows:

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',
                   '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',
                   '2003-01-11', '2003-01-13',
                   ...
                   '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',
                   '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',
                   '2016-07-30', '2016-07-31'],
                  dtype='datetime64[ns]', name='date', length=4393, freq=None)

As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-148-30857144de81> in <module>()
      1 #### DEBUG
----> 2 sales_train = disentangle(df_train)
      3 sales_holdout = disentangle(df_holdout)
      4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])

<ipython-input-147-08b4c4ecdea3> in disentangle(df_train)
      2     # transform sales table to disentangle sales time series
      3     sales = df_train[["date", "store_id", "article_id", "amount_sold"]]
----> 4     sales.index = pd.DatetimeIndex(sales["date"], freq="d")
      5     sales = sales.pivot_table(index=["store_id", "article_id", "date"])
      6     return sales

/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
     89                 else:
     90                     kwargs[new_arg_name] = new_arg_value
---> 91             return func(*args, **kwargs)
     92         return wrapper
     93     return _deprecate_kwarg

/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    399                                          'dates does not conform to passed '
    400                                          'frequency {1}'
--> 401                                          .format(inferred, freq.freqstr))
    402 
    403         if freq_infer:

ValueError: Inferred frequency None from passed dates does not conform to passed frequency D

So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex – both are None. Can someone clear up the confusion?

Asked By: clstaudt

||

Answers:

It seems to relate to missing dates as 3kt notes. You might be able to “fix” with asfreq('D') as EdChum suggests but that gives you a continuous index with missing data values. It works fine for some some sample data I made up:

df=pd.DataFrame({ 'x':[1,2,4] }, 
   index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) )

df
Out[756]: 
            x
2003-01-02  1
2003-01-03  2
2003-01-06  4

df.index
Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], 
          dtype='datetime64[ns]', freq=None)

Note that freq=None. If you apply asfreq('D'), this changes to freq='D':

df.asfreq('D')
Out[758]: 
              x
2003-01-02  1.0
2003-01-03  2.0
2003-01-04  NaN
2003-01-05  NaN
2003-01-06  4.0

df.asfreq('d').index
Out[759]: 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05',
               '2003-01-06'],
              dtype='datetime64[ns]', freq='D')

More generally, and depending on what exactly you are trying to do, you might want to check out the following for other options like reindex & resample: Add missing dates to pandas dataframe

Answered By: JohnE

You have a couple options here:

  • pd.infer_freq
  • pd.tseries.frequencies.to_offset

I suspect that errors down the road are caused by the missing freq.

You are absolutely right. Here’s what I use often:

def add_freq(idx, freq=None):
    """Add a frequency attribute to idx, through inference or directly.

    Returns a copy.  If `freq` is None, it is inferred.
    """

    idx = idx.copy()
    if freq is None:
        if idx.freq is None:
            freq = pd.infer_freq(idx)
        else:
            return idx
    idx.freq = pd.tseries.frequencies.to_offset(freq)
    if idx.freq is None:
        raise AttributeError('no discernible frequency found to `idx`.  Specify'
                             ' a frequency string with `freq`.')
    return idx

An example:

idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06'])  # freq=None

print(add_freq(idx))  # inferred
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')

print(add_freq(idx, freq='D'))  # explicit
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')

Using asfreq will actually reindex (fill) missing dates, so be careful of that if that’s not what you’re looking for.

The primary function for changing frequencies is the asfreq function.
For a DatetimeIndex, this is basically just a thin, but convenient
wrapper around reindex which generates a date_range and calls reindex.

Answered By: Brad Solomon

I’m not sure if earlier versions of python have this, but 3.6 has this simple solution:

# 'b' stands for business days
# 'w' for weekly, 'd' for daily, and you get the idea...
df.index.freq = 'b' 
Answered By: mrbTT

I am not sure but I was having the same error. I was not able to resolve my issue by suggestions posted above but solved it using the below solution.

Pandas DatetimeIndex + seasonal_decompose = missing frequency.

Best Regards

Answered By: Riz.Khan

It could happen if for examples the dates you are passing aren’t sorted.

Look at this example:

example_ts = pd.Series(data=range(10),
                       index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[-1:],
                                               example_ts.index[:-1]]), freq='D')

The previous code goes into your error, because of the non-sequential dates.

example_ts = pd.Series(data=range(10),
                       index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[:-1],
                                               example_ts.index[-1:]]), freq='D')

This one runs correctly, instead.

Answered By: RobertoDM

Similar to some of the other answers here, my problem was that my data had missing dates.

Instead of dealing with this issue in Python, I opted to change my SQL query that I was using to source the data. So instead of skipping dates, I wrote the query such that it would fill in missing dates with the value 0.

Answered By: Luca Guarro

It seems to be an issue with missing values in the index. I have simply re-build the index based on the original index in the frequency I needed:

df.index = pd.date_range(start=df.index[0], end=df.index[-1], freq="h")
Answered By: Cord Kaldemeyer