How can I add/distribute the values of one dataframe in other dataframe according to the date in Pandas
Question:
I have 2 dataframes:
What I need to code is:
- If the the Variants in df2 matches with the variants in df1
- The qty from df2 of that particular variant should be added to qty of variant in df1
- But, the qty should be added to the last date of the month available in df1.
for eg:
In df1 we only have variant A and last date of variant A is 01/31/2022 with qty 2.
In df 2, we have multiple variants. Variant A has the qty 5.
So the new df1 should be:
Variant A on 31st = 5+2 = 7.
Answers:
I’ve made a smaller example because I didn’t want to build a full database for you.
The caveats of the solution here is that if you have the main database having no rows for B and C then as seen below they will end up with Nan values. However, if you have rows for variants B and C, it should work.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"Variant":["A","A","A","A","A"],
"Date":["1/27/2022","1/28/2022","1/29/2022","1/30/2022","1/31/2022"],
"Qty":[0,0,1,2,2]})
df2 = pd.DataFrame({"Variant":["A","B","C"], "Qty":[5,6,7]})
df = pd.merge(df1,df2,how='outer',on=["Variant","Qty"])
Variant Date Qty
0 A 1/27/2022 0
1 A 1/28/2022 0
2 A 1/29/2022 1
3 A 1/30/2022 2
4 A 1/31/2022 2
5 A NaN 5
6 B NaN 6
7 C NaN 7
df["Date"] = df.groupby("Variant")["Date"].transform('ffill')
df
Variant Date Qty
0 A 1/27/2022 0
1 A 1/28/2022 0
2 A 1/29/2022 1
3 A 1/30/2022 2
4 A 1/31/2022 2
5 A 1/31/2022 5
6 B NaN 6
7 C NaN 7
We ca now group by variant and date and just sum the values. Then delete duplicate rows.
df["Qty"] = df.groupby(["Variant","Date"])["Qty"].transform('sum')
df = df.drop_duplicates()
Out[25]:
Variant Date Qty
0 A 1/27/2022 0.0
1 A 1/28/2022 0.0
2 A 1/29/2022 1.0
3 A 1/30/2022 2.0
4 A 1/31/2022 7.0
6 B NaN NaN
7 C NaN NaN
I have 2 dataframes:
What I need to code is:
- If the the Variants in df2 matches with the variants in df1
- The qty from df2 of that particular variant should be added to qty of variant in df1
- But, the qty should be added to the last date of the month available in df1.
for eg:
In df1 we only have variant A and last date of variant A is 01/31/2022 with qty 2.
In df 2, we have multiple variants. Variant A has the qty 5.
So the new df1 should be:
Variant A on 31st = 5+2 = 7.
I’ve made a smaller example because I didn’t want to build a full database for you.
The caveats of the solution here is that if you have the main database having no rows for B and C then as seen below they will end up with Nan values. However, if you have rows for variants B and C, it should work.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"Variant":["A","A","A","A","A"],
"Date":["1/27/2022","1/28/2022","1/29/2022","1/30/2022","1/31/2022"],
"Qty":[0,0,1,2,2]})
df2 = pd.DataFrame({"Variant":["A","B","C"], "Qty":[5,6,7]})
df = pd.merge(df1,df2,how='outer',on=["Variant","Qty"])
Variant Date Qty
0 A 1/27/2022 0
1 A 1/28/2022 0
2 A 1/29/2022 1
3 A 1/30/2022 2
4 A 1/31/2022 2
5 A NaN 5
6 B NaN 6
7 C NaN 7
df["Date"] = df.groupby("Variant")["Date"].transform('ffill')
df
Variant Date Qty
0 A 1/27/2022 0
1 A 1/28/2022 0
2 A 1/29/2022 1
3 A 1/30/2022 2
4 A 1/31/2022 2
5 A 1/31/2022 5
6 B NaN 6
7 C NaN 7
We ca now group by variant and date and just sum the values. Then delete duplicate rows.
df["Qty"] = df.groupby(["Variant","Date"])["Qty"].transform('sum')
df = df.drop_duplicates()
Out[25]:
Variant Date Qty
0 A 1/27/2022 0.0
1 A 1/28/2022 0.0
2 A 1/29/2022 1.0
3 A 1/30/2022 2.0
4 A 1/31/2022 7.0
6 B NaN NaN
7 C NaN NaN