SUMMARIZE (dax) equivalent in Python (Pandas)

Question:

I am new using Pandas in Python and I am facing an issue that i am not able to solve alone.

I connecting by odbc,SQL, to get df = the following data:

             JDFEC JDCPY JDTMP  PALLETS_STOCK
0       2021-06-30   164     N         1256.0
1       2022-01-27   704     N            1.0
2       2021-03-14   799     N          376.0
3       2022-01-14   723     N         1402.0
4       2022-05-19   776     N         1902.0
...            ...   ...   ...            ...
101417  2022-10-12   714     N          220.0
101418  2020-09-14   153     N          315.0
101419  2021-05-08   109     I           66.0
101420  2022-10-14   057     N           48.0
101421  2022-04-27   776     I         1820.0

I would like to manipulate it to get an outpute similar to the image:
New Table example
(So grouping by date and creating groups regarding JDCPY and JDTMP values to sum PALLETS_STOCK)

I already have Power BI doing it with a SUMMARIZE-CALCULATE-SUM as below:

NewTable = 
    SUMMARIZE(
    Query,
    Query[JDFEC],

    "GROUP-A",
        CALCULATE(SUM(Query[PALLETS_STOCKS], QueryKeynes[JDCPY] = "539" || QueryKeynes[JDCPY] = "109"),

    "GROUP-B",
        CALCULATE(SUM(Query[PALLETS_STOCKS], QueryKeynes[JDCPY] = "455", QueryKeynes[JDTMP] = "N"),

etc...

)

However I have no idea about how I could deal with it in Python ?

Someone could guide me please ?

EDIT: Final code

    conditions = [
                    df["JDCPY"].isin(["003", '006']),
                    (df["JDCPY"].eq("022")) & (df["JDTMP"].eq("N"))
                ]

    groups= ["GROUP-A","GROUP-B"]

    out= (
            df
            .assign(JDFEC= pd.to_datetime(df["JDFEC"]),
                    GROUPS= np.select(conditions, groups, default="GROUP-X"))
            .groupby(["JDFEC", "GROUPS"], as_index=False)["PALLETS_STOCK"].sum()
            .pivot_table(index= "JDFEC", columns="GROUPS", values="PALLETS_STOCK")
            .reset_index()
            .rename_axis(None, axis=1)
        ) 
    

    out.sort_values(by=["JDFEC"])
    out["JDFEC"] = pd.to_datetime(out["JDFEC"]).dt.strftime("%d/%m/%Y")

    
    print (out)


Asked By: Foguete

||

Answers:

IIUC, you can use np.select to form the groups and pandas.pivot_table to reshape.

Try this :

import pandas as pd
import numpy as np

conditions = [
                df["JDCPY"].isin([539, 109]),
                (df["JDCPY"].eq(455)) & (df["JDTMP"].eq("N"))
             ]

groups= ["GROUP-A","GROUP-B"]

out= (
        df
         .assign(JDFEC= pd.to_datetime(df["JDFEC"]).dt.strftime("%d/%m/%Y"),
                 GROUPS= np.select(conditions, choices, default="GROUP-X"))
         .groupby(["JDFEC", "GROUPS"], as_index=False)["PALLETS_STOCK"].sum()
         .pivot_table(index= "JDFEC", columns="GROUPS", values="PALLETS_STOCK")
         .reset_index()
         .rename_axis(None, axis=1)
     ) 

# Output :

print(out)
    
        JDFEC  GROUP-A  GROUP-B  GROUP-X
0  08/05/2021     66.0      NaN      NaN
1  12/10/2022    220.0      NaN      NaN
2  14/01/2022      NaN      NaN   1402.0
3  14/09/2020      NaN    315.0      NaN
4  14/10/2022      NaN      NaN     48.0
5  19/05/2022      NaN      NaN   1902.0
6  27/01/2022    377.0      NaN      NaN
7  27/04/2022      NaN      NaN   1820.0
8  30/06/2021      NaN      NaN   1256.0

# Input used :

print(df)

        JDFEC  JDCPY JDTMP  PALLETS_STOCK
0  2021-06-30    164     N         1256.0
1  2022-01-27    109     N            1.0
2  2022-01-27    109     N          376.0
3  2022-01-14    723     N         1402.0
4  2022-05-19    776     N         1902.0
5  2022-10-12    539     N          220.0
6  2020-09-14    455     N          315.0
7  2021-05-08    109     I           66.0
8  2022-10-14     57     N           48.0
9  2022-04-27    776     I         1820.0
Answered By: abokey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.