Python – Help me decode what does this code do?
Question:
I don’t have the dataset with these variables in it.
I tried making up a fake dataset to run this code and understand, but had no luck.
What does this code do?
dep_df = (
dep_df1
.assign(TENURE_GRP = lambda df: np.where(df.TENURE == 0, "NP", "TNR"))
.groupby(['TENURE_GRP', 'TIME_DIM_NB'])
.apply(
lambda g:
g
.assign(
PRIOR_TOT_BAL_WT = lambda df: df.PRIOR_TOT_BAL.divide(df.PRIOR_TOT_BAL.sum()),
NETFLOW_PER_PRIOR_TOT_BAL = (
lambda df: np where(df.PRIOT_TOT_BAL == 0,
df.NETFLOW.divide(df.PRIOR_TOT_BAL + 0.0001),
df.NETFLOW.divide(df.PRIOR_TOT_BAL)
)
),
MKTS = lambda df: df.PRIOR_TOT_BAL - df.NETFLOW
MKTS_PER_PRIOR_TOT_BAL = (
lambda df: np.where(df.PRIOR_TOT_BAL == 0,
df.MKTS.divide(df.PRIOR_TOT_BAL + 0.001),
df.MKTS.divide(df.PRIOR_TOT_BAL)
)
)
)
)
.drop(['MKTS', 'TENURE_GRP'], axis = 1)
.dropna()
.reset_index(drop = True)
)
Answers:
# to resume in short (not tested):
1. create a new column (TENURE_GRP) with 2 values : NP and TNR
2. group by TENURE_GRP and TIME_DIM_NB
3. apply a lambda function on each group
4. in the lambda function, create 4 new columns :
- PRIOR_TOT_BAL_WT = PRIOR_TOT_BAL / PRIOR_TOT_BAL.sum()
- NETFLOW_PER_PRIOT_TOT_BAL = NETFLOW / df.PRIOR_TOT_BAL + 0.0001 if df.PRIOR_TOT_BAL == 0 else NETFLOW / df.PRIOR_TOT_BAL
- MKTS = PRIOR_TOT_BAL - NETFLOW
- NETFLOW_PER_PRIOR_TOT_BAL = same as NETFLOW_PER_PRIOT_TOT_BAL but for MKTD
5. drop the column MKTS and TENURE_GRP
6. drop the rows with NaN values
7. reset the index
8. assign the result to dep_df
I don’t have the dataset with these variables in it.
I tried making up a fake dataset to run this code and understand, but had no luck.
What does this code do?
dep_df = (
dep_df1
.assign(TENURE_GRP = lambda df: np.where(df.TENURE == 0, "NP", "TNR"))
.groupby(['TENURE_GRP', 'TIME_DIM_NB'])
.apply(
lambda g:
g
.assign(
PRIOR_TOT_BAL_WT = lambda df: df.PRIOR_TOT_BAL.divide(df.PRIOR_TOT_BAL.sum()),
NETFLOW_PER_PRIOR_TOT_BAL = (
lambda df: np where(df.PRIOT_TOT_BAL == 0,
df.NETFLOW.divide(df.PRIOR_TOT_BAL + 0.0001),
df.NETFLOW.divide(df.PRIOR_TOT_BAL)
)
),
MKTS = lambda df: df.PRIOR_TOT_BAL - df.NETFLOW
MKTS_PER_PRIOR_TOT_BAL = (
lambda df: np.where(df.PRIOR_TOT_BAL == 0,
df.MKTS.divide(df.PRIOR_TOT_BAL + 0.001),
df.MKTS.divide(df.PRIOR_TOT_BAL)
)
)
)
)
.drop(['MKTS', 'TENURE_GRP'], axis = 1)
.dropna()
.reset_index(drop = True)
)
# to resume in short (not tested):
1. create a new column (TENURE_GRP) with 2 values : NP and TNR
2. group by TENURE_GRP and TIME_DIM_NB
3. apply a lambda function on each group
4. in the lambda function, create 4 new columns :
- PRIOR_TOT_BAL_WT = PRIOR_TOT_BAL / PRIOR_TOT_BAL.sum()
- NETFLOW_PER_PRIOT_TOT_BAL = NETFLOW / df.PRIOR_TOT_BAL + 0.0001 if df.PRIOR_TOT_BAL == 0 else NETFLOW / df.PRIOR_TOT_BAL
- MKTS = PRIOR_TOT_BAL - NETFLOW
- NETFLOW_PER_PRIOR_TOT_BAL = same as NETFLOW_PER_PRIOT_TOT_BAL but for MKTD
5. drop the column MKTS and TENURE_GRP
6. drop the rows with NaN values
7. reset the index
8. assign the result to dep_df