How to filter a dataframe with multiple-dimension index based on one dimension

Question:

I have a dataframe with multiple dimensional index using pandas. Say employee_id and date. Now I want to update records that’s prior to a specific date, say 2020-01-01. To be consistent with other codes, the update was made using np.where. So how can I add this date filter into this assignment, df['Sale'] = np.where(df.sale_actual>df.sale_expect, df.sale_actual, df.sale_expect). Thanks.

Asked By: Harry

||

Answers:

One solution is to create a boolean mask from the date part of the index and use it to mask out non-relevant rows during update

Sample input dataframe

df = pd.DataFrame([[0, '2019-01-01', 100, 200, 0], [1, '2019-02-01', 150, 100, 0], [0, '2021-12-12', 200, 100, 0]], columns=['id', 'date','sale_expect', 'sale_actual', 'sale'])
df['date'] = pd.to_datetime(df['date'])
df = df.set_index(['id', 'date'], drop=True)

                sale_expect sale_actual sale
id  date            
0   2019-01-01  100         200         0
1   2019-02-01  150         100         0
0   2021-12-12  200         100         0

Solution

mask = df.index.get_level_values(1) < np.datetime64('2020-01-01')
df.loc[mask, 'sale'] = np.where(df[mask].sale_actual>df[mask].sale_expect, df[mask].sale_actual, df[mask].sale_expect)

Result

                sale_expect sale_actual sale
id  date            
0   2019-01-01  100         200         200
1   2019-02-01  150         100         150
0   2021-12-12  200         100         0
Answered By: bui
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.