Slice multi-index pandas dataframe by date
Question:
Say I have the following multi-index dataframe:
arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo', 'foo']),
pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'])]
df = pd.DataFrame(np.zeros((8, 4)), index=arrays)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
How do I select only the part of this dataframe where the first index level = 'bar'
, and date > 2020.01.02
, such that I can add 1 to this part?
To be clearer, the expected output would be:
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
I managed slicing it according to the first index:
df.loc['bar']
But then I am not able to apply the condition on the date.
Answers:
Here is possible compare each level and then set 1
, there is :
for all columns in DataFrame.loc
:
m1 = df.index.get_level_values(0) =='bar'
m2 = df.index.get_level_values(1) > '2020-01-02'
df.loc[m1 & m2, :] = 1
print (df)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
#give your index names :
df.index = df.index.set_names(["names","dates"])
#get the indices that match your condition
index = df.query('names=="bar" and dates>"2020-01-02"').index
#assign 1 to the relevant points
#IndexSlice makes slicing multiindexes easier ... here though, it might be seen as overkill
idx = pd.IndexSlice
df.loc[idx[index],:] = 1
0 1 2 3
names dates
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
Say I have the following multi-index dataframe:
arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo', 'foo']),
pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'])]
df = pd.DataFrame(np.zeros((8, 4)), index=arrays)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
How do I select only the part of this dataframe where the first index level = 'bar'
, and date > 2020.01.02
, such that I can add 1 to this part?
To be clearer, the expected output would be:
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
I managed slicing it according to the first index:
df.loc['bar']
But then I am not able to apply the condition on the date.
Here is possible compare each level and then set 1
, there is :
for all columns in DataFrame.loc
:
m1 = df.index.get_level_values(0) =='bar'
m2 = df.index.get_level_values(1) > '2020-01-02'
df.loc[m1 & m2, :] = 1
print (df)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
#give your index names :
df.index = df.index.set_names(["names","dates"])
#get the indices that match your condition
index = df.query('names=="bar" and dates>"2020-01-02"').index
#assign 1 to the relevant points
#IndexSlice makes slicing multiindexes easier ... here though, it might be seen as overkill
idx = pd.IndexSlice
df.loc[idx[index],:] = 1
0 1 2 3
names dates
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0