Add new column with last month sales avg groupby machine_id and item_id
Question:
data = {'machine_id': [1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,2000],
'item_id': [100,100,100,200,300,100,100,100,200,300,100,100,100,200,300,100,200],
'Date': ['2022-03-01','2022-03-02','2022-03-03','2022-03-04','2022-03-05',
'2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05',
'2022-05-01','2022-05-02','2022-05-03','2022-05-04','2022-05-05','2022-03-06','2022-03-10'],
'Sales_qty':[5,6,5,7,8,1,3,7,9,6,4,7,8,9,4,5,8]}
df = pd.DataFrame(data)
df['total']=df.groupby(['machine_id','item_id','Date'])['Sales_qty'].transform('sum')
df['sales_avg']=df['Sales_qty']/df['total']
in output column i want last month sales avg.
|machine_id|item_id|Date|Sales_qty|total|current_sales_avg| last_month_avg|
0 1000 100 2022-03-01 5 13 0.384615 NaN
1 1000 100 2022-03-01 6 13 0.461538 NaN
2 3000 100 2022-03-01 5 5 1.000000 NaN
3 2000 200 2022-03-01 7 10 0.700000 NaN
4 3000 300 2022-03-01 8 8 1.000000 NaN
5 1000 100 2022-04-01 1 4 0.250000 0.384615
6 1000 100 2022-04-01 3 4 0.750000 0.461538
7 3000 100 2022-04-01 7 7 1.000000 1.000000
8 2000 200 2022-04-01 9 9 1.000000 0.700000
9 3000 300 2022-04-01 6 6 1.000000 1.000000
10 1000 100 2022-05-01 4 11 0.363636 0.250000
11 1000 100 2022-05-01 7 11 0.636364 0.750000
12 3000 100 2022-05-01 8 8 1.000000 1.000000
13 2000 200 2022-05-01 9 9 1.000000 1.000000
14 3000 300 2022-05-01 4 4 1.000000 1.000000
15 1000 100 2022-03-06 2 13 0.153846 Nan
16 2000 200 2022-03-10 3 10 0.300000 Nan
Need code get the output. For March month the sales avg will be nan because we dont have feb month data. For april month we have last month data that is of march and for may month we have april month data.
Answers:
Between different month groups – the values of the same day will be aligned with each other – regardless of the differences in their number and order.
Requirement: days must be unique within each month.
I am using my approach from this question: Can .apply use information from other groups?.
df['Date'] = pd.to_datetime(df['Date'])
temp = df[['Date', 'sales_avg']] # temporary df
# Set index as: day of the month
# (it will be used to align the values different month groups by day)
temp.index = temp['Date'].dt.day
grouped = temp.groupby(temp['Date'].dt.to_period('M'))['sales_avg']
group_keys = list(grouped.groups.keys()) # group keys are: year_&_month
first_group_key = group_keys[0]
prev_keys = iter(group_keys)
def get_prev_avg(ser):
if ser.name == first_group_key:
return np.nan
return grouped.get_group(next(prev_keys)).reindex(ser.index)
prev_avg = grouped.transform(get_prev_avg) # new Series (with index: day)
prev_avg.index = df.index # now with the original index
df['prev_avg'] = prev_avg
With your current data:
print(df)
machine_id item_id Date Sales_qty total sales_avg prev_avg
0 1000 100 2022-03-01 5 5 1.0 NaN
1 1000 100 2022-03-02 6 6 1.0 NaN
2 3000 100 2022-03-03 5 5 1.0 NaN
3 2000 200 2022-03-04 7 7 1.0 NaN
4 3000 300 2022-03-05 8 8 1.0 NaN
5 1000 100 2022-04-01 1 1 1.0 1.0
6 1000 100 2022-04-02 3 3 1.0 1.0
7 3000 100 2022-04-03 7 7 1.0 1.0
8 2000 200 2022-04-04 9 9 1.0 1.0
9 3000 300 2022-04-05 6 6 1.0 1.0
10 1000 100 2022-05-01 4 4 1.0 1.0
11 1000 100 2022-05-02 7 7 1.0 1.0
12 3000 100 2022-05-03 8 8 1.0 1.0
13 2000 200 2022-05-04 9 9 1.0 1.0
14 3000 300 2022-05-05 4 4 1.0 1.0
15 1000 100 2022-03-06 5 5 1.0 NaN
16 2000 200 2022-03-10 8 8 1.0 NaN
You can first agg the groupby Date as list and then explode. But before that capture the index which is the first group’s index so that later you can fill that with np.nan
df2 = df.groupby('Date')['sales_avg'].agg(list).shift().to_frame()
na_index = df2['sales_avg'].isna().index[0]
df2 = df2.bfill().explode('sales_avg')
df2.loc[na_index] = np.nan
df['sales_avg'] = df2['sales_avg']
print(df):
machine_id item_id Sales_qty total sales_avg
Date
2022-03-01 1000 100 5 31 NaN
2022-03-01 1000 100 6 31 NaN
2022-03-01 3000 100 5 31 NaN
2022-03-01 2000 200 7 31 NaN
2022-03-01 3000 300 8 31 NaN
2022-04-01 1000 100 1 26 0.16129
2022-04-01 1000 100 3 26 0.193548
2022-04-01 3000 100 7 26 0.16129
2022-04-01 2000 200 9 26 0.225806
2022-04-01 3000 300 6 26 0.258065
2022-05-01 1000 100 4 32 0.038462
2022-05-01 1000 100 7 32 0.115385
2022-05-01 30000 100 8 32 0.269231
2022-05-01 2000 200 9 32 0.346154
2022-05-01 3000 300 4 32 0.230769
data = {'machine_id': [1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,2000],
'item_id': [100,100,100,200,300,100,100,100,200,300,100,100,100,200,300,100,200],
'Date': ['2022-03-01','2022-03-02','2022-03-03','2022-03-04','2022-03-05',
'2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05',
'2022-05-01','2022-05-02','2022-05-03','2022-05-04','2022-05-05','2022-03-06','2022-03-10'],
'Sales_qty':[5,6,5,7,8,1,3,7,9,6,4,7,8,9,4,5,8]}
df = pd.DataFrame(data)
df['total']=df.groupby(['machine_id','item_id','Date'])['Sales_qty'].transform('sum')
df['sales_avg']=df['Sales_qty']/df['total']
in output column i want last month sales avg.
|machine_id|item_id|Date|Sales_qty|total|current_sales_avg| last_month_avg|
0 1000 100 2022-03-01 5 13 0.384615 NaN
1 1000 100 2022-03-01 6 13 0.461538 NaN
2 3000 100 2022-03-01 5 5 1.000000 NaN
3 2000 200 2022-03-01 7 10 0.700000 NaN
4 3000 300 2022-03-01 8 8 1.000000 NaN
5 1000 100 2022-04-01 1 4 0.250000 0.384615
6 1000 100 2022-04-01 3 4 0.750000 0.461538
7 3000 100 2022-04-01 7 7 1.000000 1.000000
8 2000 200 2022-04-01 9 9 1.000000 0.700000
9 3000 300 2022-04-01 6 6 1.000000 1.000000
10 1000 100 2022-05-01 4 11 0.363636 0.250000
11 1000 100 2022-05-01 7 11 0.636364 0.750000
12 3000 100 2022-05-01 8 8 1.000000 1.000000
13 2000 200 2022-05-01 9 9 1.000000 1.000000
14 3000 300 2022-05-01 4 4 1.000000 1.000000
15 1000 100 2022-03-06 2 13 0.153846 Nan
16 2000 200 2022-03-10 3 10 0.300000 Nan
Need code get the output. For March month the sales avg will be nan because we dont have feb month data. For april month we have last month data that is of march and for may month we have april month data.
Between different month groups – the values of the same day will be aligned with each other – regardless of the differences in their number and order.
Requirement: days must be unique within each month.
I am using my approach from this question: Can .apply use information from other groups?.
df['Date'] = pd.to_datetime(df['Date'])
temp = df[['Date', 'sales_avg']] # temporary df
# Set index as: day of the month
# (it will be used to align the values different month groups by day)
temp.index = temp['Date'].dt.day
grouped = temp.groupby(temp['Date'].dt.to_period('M'))['sales_avg']
group_keys = list(grouped.groups.keys()) # group keys are: year_&_month
first_group_key = group_keys[0]
prev_keys = iter(group_keys)
def get_prev_avg(ser):
if ser.name == first_group_key:
return np.nan
return grouped.get_group(next(prev_keys)).reindex(ser.index)
prev_avg = grouped.transform(get_prev_avg) # new Series (with index: day)
prev_avg.index = df.index # now with the original index
df['prev_avg'] = prev_avg
With your current data:
print(df)
machine_id item_id Date Sales_qty total sales_avg prev_avg
0 1000 100 2022-03-01 5 5 1.0 NaN
1 1000 100 2022-03-02 6 6 1.0 NaN
2 3000 100 2022-03-03 5 5 1.0 NaN
3 2000 200 2022-03-04 7 7 1.0 NaN
4 3000 300 2022-03-05 8 8 1.0 NaN
5 1000 100 2022-04-01 1 1 1.0 1.0
6 1000 100 2022-04-02 3 3 1.0 1.0
7 3000 100 2022-04-03 7 7 1.0 1.0
8 2000 200 2022-04-04 9 9 1.0 1.0
9 3000 300 2022-04-05 6 6 1.0 1.0
10 1000 100 2022-05-01 4 4 1.0 1.0
11 1000 100 2022-05-02 7 7 1.0 1.0
12 3000 100 2022-05-03 8 8 1.0 1.0
13 2000 200 2022-05-04 9 9 1.0 1.0
14 3000 300 2022-05-05 4 4 1.0 1.0
15 1000 100 2022-03-06 5 5 1.0 NaN
16 2000 200 2022-03-10 8 8 1.0 NaN
You can first agg the groupby Date as list and then explode. But before that capture the index which is the first group’s index so that later you can fill that with np.nan
df2 = df.groupby('Date')['sales_avg'].agg(list).shift().to_frame()
na_index = df2['sales_avg'].isna().index[0]
df2 = df2.bfill().explode('sales_avg')
df2.loc[na_index] = np.nan
df['sales_avg'] = df2['sales_avg']
print(df):
machine_id item_id Sales_qty total sales_avg
Date
2022-03-01 1000 100 5 31 NaN
2022-03-01 1000 100 6 31 NaN
2022-03-01 3000 100 5 31 NaN
2022-03-01 2000 200 7 31 NaN
2022-03-01 3000 300 8 31 NaN
2022-04-01 1000 100 1 26 0.16129
2022-04-01 1000 100 3 26 0.193548
2022-04-01 3000 100 7 26 0.16129
2022-04-01 2000 200 9 26 0.225806
2022-04-01 3000 300 6 26 0.258065
2022-05-01 1000 100 4 32 0.038462
2022-05-01 1000 100 7 32 0.115385
2022-05-01 30000 100 8 32 0.269231
2022-05-01 2000 200 9 32 0.346154
2022-05-01 3000 300 4 32 0.230769