SUMMARIZE (dax) equivalent in Python (Pandas)
Question:
I am new using Pandas in Python and I am facing an issue that i am not able to solve alone.
I connecting by odbc,SQL, to get df = the following data:
JDFEC JDCPY JDTMP PALLETS_STOCK
0 2021-06-30 164 N 1256.0
1 2022-01-27 704 N 1.0
2 2021-03-14 799 N 376.0
3 2022-01-14 723 N 1402.0
4 2022-05-19 776 N 1902.0
... ... ... ... ...
101417 2022-10-12 714 N 220.0
101418 2020-09-14 153 N 315.0
101419 2021-05-08 109 I 66.0
101420 2022-10-14 057 N 48.0
101421 2022-04-27 776 I 1820.0
I would like to manipulate it to get an outpute similar to the image:
New Table example
(So grouping by date and creating groups regarding JDCPY and JDTMP values to sum PALLETS_STOCK)
I already have Power BI doing it with a SUMMARIZE-CALCULATE-SUM as below:
NewTable =
SUMMARIZE(
Query,
Query[JDFEC],
"GROUP-A",
CALCULATE(SUM(Query[PALLETS_STOCKS], QueryKeynes[JDCPY] = "539" || QueryKeynes[JDCPY] = "109"),
"GROUP-B",
CALCULATE(SUM(Query[PALLETS_STOCKS], QueryKeynes[JDCPY] = "455", QueryKeynes[JDTMP] = "N"),
etc...
)
However I have no idea about how I could deal with it in Python ?
Someone could guide me please ?
EDIT: Final code
conditions = [
df["JDCPY"].isin(["003", '006']),
(df["JDCPY"].eq("022")) & (df["JDTMP"].eq("N"))
]
groups= ["GROUP-A","GROUP-B"]
out= (
df
.assign(JDFEC= pd.to_datetime(df["JDFEC"]),
GROUPS= np.select(conditions, groups, default="GROUP-X"))
.groupby(["JDFEC", "GROUPS"], as_index=False)["PALLETS_STOCK"].sum()
.pivot_table(index= "JDFEC", columns="GROUPS", values="PALLETS_STOCK")
.reset_index()
.rename_axis(None, axis=1)
)
out.sort_values(by=["JDFEC"])
out["JDFEC"] = pd.to_datetime(out["JDFEC"]).dt.strftime("%d/%m/%Y")
print (out)
Answers:
IIUC, you can use np.select
to form the groups and pandas.pivot_table
to reshape.
Try this :
import pandas as pd
import numpy as np
conditions = [
df["JDCPY"].isin([539, 109]),
(df["JDCPY"].eq(455)) & (df["JDTMP"].eq("N"))
]
groups= ["GROUP-A","GROUP-B"]
out= (
df
.assign(JDFEC= pd.to_datetime(df["JDFEC"]).dt.strftime("%d/%m/%Y"),
GROUPS= np.select(conditions, choices, default="GROUP-X"))
.groupby(["JDFEC", "GROUPS"], as_index=False)["PALLETS_STOCK"].sum()
.pivot_table(index= "JDFEC", columns="GROUPS", values="PALLETS_STOCK")
.reset_index()
.rename_axis(None, axis=1)
)
# Output :
print(out)
JDFEC GROUP-A GROUP-B GROUP-X
0 08/05/2021 66.0 NaN NaN
1 12/10/2022 220.0 NaN NaN
2 14/01/2022 NaN NaN 1402.0
3 14/09/2020 NaN 315.0 NaN
4 14/10/2022 NaN NaN 48.0
5 19/05/2022 NaN NaN 1902.0
6 27/01/2022 377.0 NaN NaN
7 27/04/2022 NaN NaN 1820.0
8 30/06/2021 NaN NaN 1256.0
# Input used :
print(df)
JDFEC JDCPY JDTMP PALLETS_STOCK
0 2021-06-30 164 N 1256.0
1 2022-01-27 109 N 1.0
2 2022-01-27 109 N 376.0
3 2022-01-14 723 N 1402.0
4 2022-05-19 776 N 1902.0
5 2022-10-12 539 N 220.0
6 2020-09-14 455 N 315.0
7 2021-05-08 109 I 66.0
8 2022-10-14 57 N 48.0
9 2022-04-27 776 I 1820.0
I am new using Pandas in Python and I am facing an issue that i am not able to solve alone.
I connecting by odbc,SQL, to get df = the following data:
JDFEC JDCPY JDTMP PALLETS_STOCK
0 2021-06-30 164 N 1256.0
1 2022-01-27 704 N 1.0
2 2021-03-14 799 N 376.0
3 2022-01-14 723 N 1402.0
4 2022-05-19 776 N 1902.0
... ... ... ... ...
101417 2022-10-12 714 N 220.0
101418 2020-09-14 153 N 315.0
101419 2021-05-08 109 I 66.0
101420 2022-10-14 057 N 48.0
101421 2022-04-27 776 I 1820.0
I would like to manipulate it to get an outpute similar to the image:
New Table example
(So grouping by date and creating groups regarding JDCPY and JDTMP values to sum PALLETS_STOCK)
I already have Power BI doing it with a SUMMARIZE-CALCULATE-SUM as below:
NewTable =
SUMMARIZE(
Query,
Query[JDFEC],
"GROUP-A",
CALCULATE(SUM(Query[PALLETS_STOCKS], QueryKeynes[JDCPY] = "539" || QueryKeynes[JDCPY] = "109"),
"GROUP-B",
CALCULATE(SUM(Query[PALLETS_STOCKS], QueryKeynes[JDCPY] = "455", QueryKeynes[JDTMP] = "N"),
etc...
)
However I have no idea about how I could deal with it in Python ?
Someone could guide me please ?
EDIT: Final code
conditions = [
df["JDCPY"].isin(["003", '006']),
(df["JDCPY"].eq("022")) & (df["JDTMP"].eq("N"))
]
groups= ["GROUP-A","GROUP-B"]
out= (
df
.assign(JDFEC= pd.to_datetime(df["JDFEC"]),
GROUPS= np.select(conditions, groups, default="GROUP-X"))
.groupby(["JDFEC", "GROUPS"], as_index=False)["PALLETS_STOCK"].sum()
.pivot_table(index= "JDFEC", columns="GROUPS", values="PALLETS_STOCK")
.reset_index()
.rename_axis(None, axis=1)
)
out.sort_values(by=["JDFEC"])
out["JDFEC"] = pd.to_datetime(out["JDFEC"]).dt.strftime("%d/%m/%Y")
print (out)
IIUC, you can use np.select
to form the groups and pandas.pivot_table
to reshape.
Try this :
import pandas as pd
import numpy as np
conditions = [
df["JDCPY"].isin([539, 109]),
(df["JDCPY"].eq(455)) & (df["JDTMP"].eq("N"))
]
groups= ["GROUP-A","GROUP-B"]
out= (
df
.assign(JDFEC= pd.to_datetime(df["JDFEC"]).dt.strftime("%d/%m/%Y"),
GROUPS= np.select(conditions, choices, default="GROUP-X"))
.groupby(["JDFEC", "GROUPS"], as_index=False)["PALLETS_STOCK"].sum()
.pivot_table(index= "JDFEC", columns="GROUPS", values="PALLETS_STOCK")
.reset_index()
.rename_axis(None, axis=1)
)
# Output :
print(out)
JDFEC GROUP-A GROUP-B GROUP-X
0 08/05/2021 66.0 NaN NaN
1 12/10/2022 220.0 NaN NaN
2 14/01/2022 NaN NaN 1402.0
3 14/09/2020 NaN 315.0 NaN
4 14/10/2022 NaN NaN 48.0
5 19/05/2022 NaN NaN 1902.0
6 27/01/2022 377.0 NaN NaN
7 27/04/2022 NaN NaN 1820.0
8 30/06/2021 NaN NaN 1256.0
# Input used :
print(df)
JDFEC JDCPY JDTMP PALLETS_STOCK
0 2021-06-30 164 N 1256.0
1 2022-01-27 109 N 1.0
2 2022-01-27 109 N 376.0
3 2022-01-14 723 N 1402.0
4 2022-05-19 776 N 1902.0
5 2022-10-12 539 N 220.0
6 2020-09-14 455 N 315.0
7 2021-05-08 109 I 66.0
8 2022-10-14 57 N 48.0
9 2022-04-27 776 I 1820.0