Tricky create calculation that pulls in retro values using Pandas

Question:

I have a dataset where I would like to create a new column called ‘aa_cumul’, by taking the sum, (Where the first instance of a numerical value occurs) for a specific city and ID of the value in the column,’new_r_aa’, which is 2, and the value in the column ‘cml_aa_bx’, 1 = 3.
From there we will take the cumulative sum of the value in ‘aa_cumul’ and ‘new r aa’
(3+8 = 11, 11+9 = 20 etc)

Data

import pandas as pd
data = {
    'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
    'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
    'cml_aa_bx': [1, 3, 6, 10, 12, 2],
    'new_r_aa': [2, 6, 9, 8, 6, 5]
}

df = pd.DataFrame(data)

Desired

data = {
    'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
    'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
    'cml_aa_bx': [1, 3, 6, 10, 12, 2],
    'new_r_aa': [2, 6, 9, 8, 6, 5],
    'aa_cumul': [3, 11, 20, 28, 34, 6]
}

Doing

# Initialize the 'new cuml aa' column

new_cuml_aa = []

# Initialize the first value in 'new cuml aa' with the sum of the first value in 'new r aa' and 'cml_aa_bx'
new_cuml_aa.append(df['new_r_aa'][0] + df['cml_aa_bx'][0])

# Loop through the DataFrame to calculate 'new cuml aa' values
for i in range(1, len(df)):
    new_cuml_aa_value = new_cuml_aa[i - 1] + df['new_r_aa'][i]
    new_cuml_aa.append(new_cuml_aa_value)

However, this is giving me the wrong values/output. Any suggestion is appreciated

Asked By: Lynn

||

Answers:

One option is with pd.Series.mask, where you create a condition and subsequently run the cumulative sum :

(df
.assign(aa_cumul = df['new r aa']
                  .mask(df.index==0, df.cml_aa_bx+df['new r aa'])
                  .cumsum()
    )
)
  city  ID quarter  cml_bb_bx  r_aa_bx  cml_aa_bx  BB_AA_Bx_Ratio  expected_aa_bx_delta  total aa  total round aa  new r aa  aa_cumul
0   NY  AA  2024Q1          6        0          1        6.000000                 1.810       1.8               2         2         3
1   NY  AA  2024Q2         13        2          3        4.333333                 2.857       4.9               6         8        11
2   NY  AA  2024Q3         18        3          6        3.000000                 2.395       5.4               6         9        20
3   NY  AA  2024Q4         20        4         10        2.000000                 0.000       4.0               4         8        28
Answered By: sammywemmy

Update: if you want it grouped, you can use:

df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].cumsum() + df.groupby(['city', 'ID'])['cml_aa_bx'].transform('first')

Original:

Unclear if you’re wanting a dataframe answer or a dictionary answer. Here’s a dataframe answer:

import pandas as pd

df = pd.DataFrame(data)
df['aa_cumul'] = df['new_r_aa'].cumsum() + df['cml_aa_bx'][0]

Output:

  city  ID quarter  cml_bb_bx  ...  total aa  total round aa  new_r_aa  aa_cumul
0   NY  AA  2024Q1          6  ...       1.8               2         2           3
1   NY  AA  2024Q2         13  ...       4.9               6         8          11
2   NY  AA  2024Q3         18  ...       5.4               6         9          20
3   NY  AA  2024Q4         20  ...       4.0               4         8          28

[4 rows x 12 columns]

…and here’s a dictionary answer (using numpy):

import numpy as np

data['aa_cumul'] = np.cumsum(data['new_r_aa']) + df['cml_aa_bx'][0]

Output:

{'city': ['NY', 'NY', 'NY', 'NY'], 'ID': ['AA', 'AA', 'AA', 'AA'], 'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4'], 'cml_bb_bx': [6, 13, 18, 20], 'r_aa_bx': [0, 2, 3, 4], 'cml_aa_bx': [1, 3, 6, 10], 'BB_AA_Bx_Ratio': [6, 4.333333333, 3, 2], 'expected_aa_bx_delta': [1.81, 2.857, 2.395, 0], 'total aa': [1.8, 4.9, 5.4, 4.0], 'total round aa': [2, 6, 6, 4], 'new_r_aa': [2, 8, 9, 8], 'aa_cumul': array([ 3, 11, 20, 28])}
Answered By: Mark
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.