Iterating through dictionaries – FutureWarning: Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is deprecated

Question:

I am transitioning back into Python after much time in R and I can’t remember the best methods for storing multiple participants’ time series. Some searching suggested dictionaries are good, but as I am iterating through dictionaries I have received the following warning a few times:
indexing.py:1069: FutureWarning: Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is deprecated and will raise a KeyError in a future Version. return self._getitem_tuple_same_dim(tup)

What I should be doing differently? Thank you so much!

df_motion =

datetime x y z
2020-07-10 13:49:11.429 0.213234 -0.069581 -10.066122
2020-07-10 13:49:11.440 0.219219 -0.047585 -10.085126
2020-07-10 13:49:11.450 0.319219 -0.057585 -10.185357
df_motion_dict[1] = df_motion # and other pandas dataframes of motion signals where the keys are a list with missing numbers, for example 1,3,4,10. 
audience = {}
for k,v in df_motion_dict.items():
    df_motion = df_motion_dict[k]
    df_motion = df_motion.loc['2020-07-10 13:55' : '2020-07-10 14:25.5', :]
    if len(df_motion)>0: # remove any dataframes not containing data
        audience[k]=df_motion    
Asked By: Dana

||

Answers:

I’ve had the same problem here.
I searched pandas documentation and couldn’t find an alternative way of doing that or any explanation on why it will deprecated at all.
Anyway, I could get rid of the warning by sorting the index before query.

Try this:

df_motion = df_motion.sort_index().loc['2020-07-10 13:55' : '2020-07-10 14:25.5', :] 

I hope this works for you as well.

Answered By: Roberto Narciso

I see you already resolved this issue, but I want to clarify for posterity that the root cause of this warning is the inconsistent spacing in the time series index, while at the same time slicing with keys that do not exist in the index.
There are two main remedies:

  1. Use df_motion.resample() and resample to the frequency that’s appropriate (possibly 10ms in this case), providing an aggregation function of some kind, then fill in the blanks as necessary, depending on the change in frequency and the data itself. This will address the "non-monotonic" aspect of the error and allow you to continue slicing with keys that don’t exist in the index.
    df_motion = df_motion.resample(freq='10ms').mean()
  2. Slice with keys that exist exactly in the index, e.g. df_motion.loc['2020-07-10 13:49:11.429':'2020-07-10 13:49:11.450']. This will work the same way that loc normally does when dealing with a non-time-series dataframe, and will not raise this error.

As an aside, you’re also creating a duplicate variable in your for loop. When you iterate over a dictionary’s items and assign them temporarily to k, v, k will hold each key, one by one, and v will hold each associated value. So inside of the for loop, you could use v instead of creating the df_motion variable:

audience = {}
for k,v in df_motion_dict.items():
    v = v.loc['2020-07-10 13:55' : '2020-07-10 14:25.5', :]
    if len(v)>0: # remove any dataframes not containing data
        audience[k]=v

or more succinctly:

audience = {
    k: v.loc['2020-07-10 13:55':'2020-07-10 14:25.5']
    for k, v in df_motion_dict.items()
    if not v.empty
}
Answered By: Charles Bushrow

I feel your pain this is pretty annoying to make this change. Here is another way to solve using pd.date_range() or pd.bdate_range() if you need to use business days.

sd = '2020-01-01' # or dt.datetime(2020, 1, 1) 
ed = '2020-02-01' 
dates = pd.date_range(sd, ed)
df = df[df.index.isin(dates)]
Answered By: Jerald Achaibar
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.