Add new column with last month sales avg groupby machine_id and item_id

Question:

data = {'machine_id': [1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,1000,3000,2000,3000,1000,2000], 
    'item_id': [100,100,100,200,300,100,100,100,200,300,100,100,100,200,300,100,200],
    'Date': ['2022-03-01','2022-03-02','2022-03-03','2022-03-04','2022-03-05',
    '2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05',
    '2022-05-01','2022-05-02','2022-05-03','2022-05-04','2022-05-05','2022-03-06','2022-03-10'],
    'Sales_qty':[5,6,5,7,8,1,3,7,9,6,4,7,8,9,4,5,8]}
df = pd.DataFrame(data)
    df['total']=df.groupby(['machine_id','item_id','Date'])['Sales_qty'].transform('sum')
    df['sales_avg']=df['Sales_qty']/df['total']

in output column i want last month sales avg.

|machine_id|item_id|Date|Sales_qty|total|current_sales_avg| last_month_avg|
0   1000    100 2022-03-01  5   13  0.384615    NaN
1   1000    100 2022-03-01  6   13  0.461538    NaN
2   3000    100 2022-03-01  5   5   1.000000    NaN
3   2000    200 2022-03-01  7   10  0.700000    NaN
4   3000    300 2022-03-01  8   8   1.000000    NaN
5   1000    100 2022-04-01  1   4   0.250000    0.384615
6   1000    100 2022-04-01  3   4   0.750000    0.461538
7   3000    100 2022-04-01  7   7   1.000000    1.000000
8   2000    200 2022-04-01  9   9   1.000000    0.700000
9   3000    300 2022-04-01  6   6   1.000000    1.000000
10  1000    100 2022-05-01  4   11  0.363636    0.250000
11  1000    100 2022-05-01  7   11  0.636364    0.750000
12  3000    100 2022-05-01  8   8   1.000000    1.000000
13  2000    200 2022-05-01  9   9   1.000000    1.000000
14  3000    300 2022-05-01  4   4   1.000000    1.000000
15  1000    100 2022-03-06  2   13  0.153846    Nan
16  2000    200 2022-03-10  3   10  0.300000    Nan

Need code get the output. For March month the sales avg will be nan because we dont have feb month data. For april month we have last month data that is of march and for may month we have april month data.

Asked By: sachin Dhulshette

||

Answers:

Between different month groups – the values of the same day will be aligned with each other – regardless of the differences in their number and order.

Requirement: days must be unique within each month.

I am using my approach from this question: Can .apply use information from other groups?.

df['Date'] = pd.to_datetime(df['Date'])
temp = df[['Date', 'sales_avg']]  # temporary df

# Set index as: day of the month
# (it will be used to align the values different month groups by day)
temp.index = temp['Date'].dt.day

grouped = temp.groupby(temp['Date'].dt.to_period('M'))['sales_avg']

group_keys = list(grouped.groups.keys())  # group keys are: year_&_month
first_group_key = group_keys[0]
prev_keys = iter(group_keys)

def get_prev_avg(ser):
    if ser.name == first_group_key:
        return np.nan
    return grouped.get_group(next(prev_keys)).reindex(ser.index)

prev_avg = grouped.transform(get_prev_avg)  # new Series (with index: day)
prev_avg.index = df.index                   # now with the original index
df['prev_avg'] = prev_avg

With your current data:

print(df)

    machine_id  item_id       Date  Sales_qty  total  sales_avg  prev_avg
0         1000      100 2022-03-01          5      5        1.0       NaN
1         1000      100 2022-03-02          6      6        1.0       NaN
2         3000      100 2022-03-03          5      5        1.0       NaN
3         2000      200 2022-03-04          7      7        1.0       NaN
4         3000      300 2022-03-05          8      8        1.0       NaN
5         1000      100 2022-04-01          1      1        1.0       1.0
6         1000      100 2022-04-02          3      3        1.0       1.0
7         3000      100 2022-04-03          7      7        1.0       1.0
8         2000      200 2022-04-04          9      9        1.0       1.0
9         3000      300 2022-04-05          6      6        1.0       1.0
10        1000      100 2022-05-01          4      4        1.0       1.0
11        1000      100 2022-05-02          7      7        1.0       1.0
12        3000      100 2022-05-03          8      8        1.0       1.0
13        2000      200 2022-05-04          9      9        1.0       1.0
14        3000      300 2022-05-05          4      4        1.0       1.0
15        1000      100 2022-03-06          5      5        1.0       NaN
16        2000      200 2022-03-10          8      8        1.0       NaN

Answered By: Vladimir Fokow

You can first agg the groupby Date as list and then explode. But before that capture the index which is the first group’s index so that later you can fill that with np.nan

df2 = df.groupby('Date')['sales_avg'].agg(list).shift().to_frame()
na_index = df2['sales_avg'].isna().index[0]  
df2 = df2.bfill().explode('sales_avg')
df2.loc[na_index] = np.nan
df['sales_avg'] = df2['sales_avg']

print(df):

            machine_id  item_id  Sales_qty  total sales_avg
Date                                                       
2022-03-01        1000      100          5     31       NaN
2022-03-01        1000      100          6     31       NaN
2022-03-01        3000      100          5     31       NaN
2022-03-01        2000      200          7     31       NaN
2022-03-01        3000      300          8     31       NaN
2022-04-01        1000      100          1     26   0.16129
2022-04-01        1000      100          3     26  0.193548
2022-04-01        3000      100          7     26   0.16129
2022-04-01        2000      200          9     26  0.225806
2022-04-01        3000      300          6     26  0.258065
2022-05-01        1000      100          4     32  0.038462
2022-05-01        1000      100          7     32  0.115385
2022-05-01       30000      100          8     32  0.269231
2022-05-01        2000      200          9     32  0.346154
2022-05-01        3000      300          4     32  0.230769
Answered By: SomeDude
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.