How can I add/distribute the values of one dataframe in other dataframe according to the date in Pandas

Question:

I have 2 dataframes:

df1:
enter image description here

df2:
enter image description here

What I need to code is:

  1. If the the Variants in df2 matches with the variants in df1
  2. The qty from df2 of that particular variant should be added to qty of variant in df1
  3. But, the qty should be added to the last date of the month available in df1.

for eg:
In df1 we only have variant A and last date of variant A is 01/31/2022 with qty 2.
In df 2, we have multiple variants. Variant A has the qty 5.

So the new df1 should be:
Variant A on 31st = 5+2 = 7.

enter image description here

Asked By: kanika

||

Answers:

I’ve made a smaller example because I didn’t want to build a full database for you.
The caveats of the solution here is that if you have the main database having no rows for B and C then as seen below they will end up with Nan values. However, if you have rows for variants B and C, it should work.

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"Variant":["A","A","A","A","A"], 
      "Date":["1/27/2022","1/28/2022","1/29/2022","1/30/2022","1/31/2022"],
      "Qty":[0,0,1,2,2]})
df2 = pd.DataFrame({"Variant":["A","B","C"], "Qty":[5,6,7]})
df = pd.merge(df1,df2,how='outer',on=["Variant","Qty"])
  Variant       Date  Qty
0       A  1/27/2022    0
1       A  1/28/2022    0
2       A  1/29/2022    1
3       A  1/30/2022    2
4       A  1/31/2022    2
5       A        NaN    5
6       B        NaN    6
7       C        NaN    7
df["Date"] = df.groupby("Variant")["Date"].transform('ffill')
df
  Variant       Date  Qty
0       A  1/27/2022    0
1       A  1/28/2022    0
2       A  1/29/2022    1
3       A  1/30/2022    2
4       A  1/31/2022    2
5       A  1/31/2022    5
6       B        NaN    6
7       C        NaN    7

We ca now group by variant and date and just sum the values. Then delete duplicate rows.

df["Qty"] = df.groupby(["Variant","Date"])["Qty"].transform('sum')
df = df.drop_duplicates()
Out[25]: 
  Variant       Date  Qty
0       A  1/27/2022  0.0
1       A  1/28/2022  0.0
2       A  1/29/2022  1.0
3       A  1/30/2022  2.0
4       A  1/31/2022  7.0
6       B        NaN  NaN
7       C        NaN  NaN
Answered By: Vikram Raghu
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.