Iterating on group of columns in a dataframe from custom list – pandas

Question

I have a dataframe df like this

TxnId     TxnDate           TxnCount
  100     2023-02-01      2
  500     2023-02-01      1
  400     2023-02-01      4
  100     2023-02-02      3
  500     2023-02-02      5
  100     2023-02-03      3
  500     2023-02-03      5
  400     2023-02-03      2

I have the following custom lists

datelist = [datetime.date(2023,02,03), datetime.date(2023,02,02)]
txnlist = [400,500]

I want to iterate the df as per below logic:

for every txn in txnlist:
     sum = 0
     for every date in datelist:
           sum += df[txn][date].TxnCount

I would also be interested to understand how to find average of TxnCount for filtered TxnIds.

After Sum step based on above input and filters:

 TxnId         TxnCount
  400          2
  500          10

Average corresponding to TxnId 400 = (2+0)/2 = 1

Average corresponding to TxnId 500 = (5+5)/2 = 5

If average > 3 , add row from dataframe to breachList

breachList =[[500,10]]

Please help me how to do this in pandas

Asked By: KurinchiMalar

||

Source

Answer 1

Filter DataFrame by both lists first by boolean indexing with Series.isin:

df1 = df[df['TxnId'].isin(txnlist) & pd.to_datetime(df['TxnDate']).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

And then for sum of column TxnCount per groups:

out = df1.groupby('TxnId', as_index=False)['TxnCount'].sum()
print (out)
   TxnId  TxnCount
0    400         2
1    500        10

EDIT: If need filter TxnId by average, here greater like 4 use:

df1 = df[df['TxnId'].isin(txnlist) & pd.to_datetime(df['TxnDate']).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

#create averages per TxnId
out = df1.groupby('TxnId')['TxnCount'].mean()
print (out)
TxnId
400    2
500    5
Name: TxnCount, dtype: int64

#get TxnId greater like 4
TxnId = out[out > 4].index
print (TxnId)
Int64Index([500], dtype='int64', name='TxnId')

Filter rows in df or df1:

df2 = df[df['TxnId'].isin(TxnId)]
print(df2)
   TxnId     TxnDate  TxnCount
1    500  2023-02-01         1
4    500  2023-02-02         5
6    500  2023-02-03         5

df3 = df1[df1['TxnId'].isin(TxnId)]
print(df3)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5

EDIT1: For expected ouput use:

First filter by lists (for avoid processig all rows):

df1 = df[df['TxnId'].isin(txnlist) & pd.to_datetime(df['TxnDate']).dt.date.isin(datelist)]
print (df1)
   TxnId     TxnDate  TxnCount
4    500  2023-02-02         5
6    500  2023-02-03         5
7    400  2023-02-03         2

Pivoting for all combinations TxnDate/TxnId :

out = df1.pivot_table(index='TxnId', 
                      columns='TxnDate', 
                      values='TxnCount', 
                      aggfunc='sum', 
                      fill_value=0)
print (out)
TxnDate  2023-02-02  2023-02-03
TxnId                          
400               0           2
500               5           5

Last filtered summed values by means per rows and convert to lists:

breachList = out.sum(axis=1)[out.mean(axis=1).gt(3)].reset_index().to_numpy().tolist()
print (breachList)
[[500, 10]]

Answered By: jezrael

Answer 2

The fact that your are using a nested loop is reminiscent of a 2D pivot_table (or crosstab):

df['TxnDate'] = pd.to_datetime(df['TxnDate'])

out = (df.pivot_table(index='TxnId', columns='TxnDate',
                      values='TxnCount', aggfunc='sum'
                      fill_value=0)
         .reindex(txnlist, datelist)
       )

Output:

TxnDate  2023-02-03  2023-02-02
TxnId                          
400               2           0
500               5           5

And if you want to further aggregate on Ids (or Date):

out.sum(axis=1)

TxnId
400     2
500    10
dtype: int64

Answered By: mozway

Iterating on group of columns in a dataframe from custom list – pandas

Question:

Answers: