Insert a row if a certain condition is met
Question:
I am trying to insert a row above specified rows every time a certain condition is met. Here is an example dataframe I created that is way less extensive than mine and the desired dataframe result.
So the goal is to insert np.NaN every time the ‘Product’ and ‘Location’ together are different. I have thought of .insert(), .iloc[], etc. but not sure how that input would even look if certain conditions are needed to be met. I have laid out the conditions to help visualize the conditional expectations as well:
df_no = pd.DataFrame({'Product': ['toy', 'toy', 'toy', 'toy', 'bear', 'bear'],
'Location': ['Dallas', 'Dallas', 'Houston', 'Houston', 'Miami', 'Miami'],
'Value' : [8, 7, 3, 5, 4, 7],
'Cumulative Value':[8, 15, 18, 23, 27, 34]})
df_no
Example of the conditions for me to proceed and sort of a poor attempt of code:
cond_Product = df_no['Product'] == df_no['Product'].shift(-1)
cond_Location = df_no['Location'] == df_no['Location'].shift(-1)
##df_no = np.where((cond_Product) & (cond_Location), np.NaN, df_no.value)
Expected result:
df_yes = pd.DataFrame({'Product': ['toy', 'toy', np.NaN, 'toy', 'toy', np.NaN, 'bear', 'bear'],
'Location': ['Dallas', 'Dallas', np.NaN, 'Houston', 'Houston', np.NaN, 'Miami', 'Miami'],
'Value' : [8, 7, np.NaN, 3, 5, np.NaN, 4, 7],
'Cumulative Value':[8, 15, np.NaN, 18, 23, np.NaN, 27, 34]})
df_yes
Answers:
here is one way to do it
#create a temp seq, to help identify the last row where the values are changing
df['seq']=df.groupby(['Product','Location'],)['Product'].cumcount()
#concat the DF and the rows where the seq count is largest for the group
# this result in null values for non-selected columns
df2=pd.concat([df,
df.groupby(['Product','Location'], as_index=False)['seq'].max()
],
axis=0
).sort_values(['Product','Location']).reset_index()
#mask the rows with NaN where any value (like value columns) is null
df2=df2.mask(df2['Value'].isna())
# drop unwanted columns
df2.drop(columns=['index', 'seq'], inplace=True)
df2
Product Location Value Cumulative Value
0 bear Miami 4.0 27.0
1 bear Miami 7.0 34.0
2 NaN NaN NaN NaN
3 toy Dallas 8.0 8.0
4 toy Dallas 7.0 15.0
5 NaN NaN NaN NaN
6 toy Houston 3.0 18.0
7 toy Houston 5.0 23.0
8 NaN NaN NaN NaN
I am trying to insert a row above specified rows every time a certain condition is met. Here is an example dataframe I created that is way less extensive than mine and the desired dataframe result.
So the goal is to insert np.NaN every time the ‘Product’ and ‘Location’ together are different. I have thought of .insert(), .iloc[], etc. but not sure how that input would even look if certain conditions are needed to be met. I have laid out the conditions to help visualize the conditional expectations as well:
df_no = pd.DataFrame({'Product': ['toy', 'toy', 'toy', 'toy', 'bear', 'bear'],
'Location': ['Dallas', 'Dallas', 'Houston', 'Houston', 'Miami', 'Miami'],
'Value' : [8, 7, 3, 5, 4, 7],
'Cumulative Value':[8, 15, 18, 23, 27, 34]})
df_no
Example of the conditions for me to proceed and sort of a poor attempt of code:
cond_Product = df_no['Product'] == df_no['Product'].shift(-1)
cond_Location = df_no['Location'] == df_no['Location'].shift(-1)
##df_no = np.where((cond_Product) & (cond_Location), np.NaN, df_no.value)
Expected result:
df_yes = pd.DataFrame({'Product': ['toy', 'toy', np.NaN, 'toy', 'toy', np.NaN, 'bear', 'bear'],
'Location': ['Dallas', 'Dallas', np.NaN, 'Houston', 'Houston', np.NaN, 'Miami', 'Miami'],
'Value' : [8, 7, np.NaN, 3, 5, np.NaN, 4, 7],
'Cumulative Value':[8, 15, np.NaN, 18, 23, np.NaN, 27, 34]})
df_yes
here is one way to do it
#create a temp seq, to help identify the last row where the values are changing
df['seq']=df.groupby(['Product','Location'],)['Product'].cumcount()
#concat the DF and the rows where the seq count is largest for the group
# this result in null values for non-selected columns
df2=pd.concat([df,
df.groupby(['Product','Location'], as_index=False)['seq'].max()
],
axis=0
).sort_values(['Product','Location']).reset_index()
#mask the rows with NaN where any value (like value columns) is null
df2=df2.mask(df2['Value'].isna())
# drop unwanted columns
df2.drop(columns=['index', 'seq'], inplace=True)
df2
Product Location Value Cumulative Value
0 bear Miami 4.0 27.0
1 bear Miami 7.0 34.0
2 NaN NaN NaN NaN
3 toy Dallas 8.0 8.0
4 toy Dallas 7.0 15.0
5 NaN NaN NaN NaN
6 toy Houston 3.0 18.0
7 toy Houston 5.0 23.0
8 NaN NaN NaN NaN