Insert a row if a certain condition is met

Question:

I am trying to insert a row above specified rows every time a certain condition is met. Here is an example dataframe I created that is way less extensive than mine and the desired dataframe result.

So the goal is to insert np.NaN every time the ‘Product’ and ‘Location’ together are different. I have thought of .insert(), .iloc[], etc. but not sure how that input would even look if certain conditions are needed to be met. I have laid out the conditions to help visualize the conditional expectations as well:

df_no = pd.DataFrame({'Product': ['toy', 'toy', 'toy', 'toy', 'bear', 'bear'], 
                      'Location': ['Dallas', 'Dallas', 'Houston', 'Houston', 'Miami', 'Miami'], 
                       'Value' : [8, 7, 3, 5, 4, 7],
                       'Cumulative Value':[8, 15, 18, 23, 27, 34]})
df_no

Example of the conditions for me to proceed and sort of a poor attempt of code:

cond_Product = df_no['Product'] == df_no['Product'].shift(-1)
cond_Location = df_no['Location'] == df_no['Location'].shift(-1)
##df_no = np.where((cond_Product) & (cond_Location), np.NaN, df_no.value)

Expected result:

df_yes = pd.DataFrame({'Product': ['toy', 'toy', np.NaN, 'toy', 'toy', np.NaN, 'bear', 'bear'],
                       'Location': ['Dallas', 'Dallas', np.NaN, 'Houston', 'Houston', np.NaN, 'Miami', 'Miami'],
                       'Value' : [8, 7, np.NaN, 3, 5, np.NaN, 4, 7],
                       'Cumulative Value':[8, 15, np.NaN, 18, 23, np.NaN, 27, 34]})
df_yes

Asked By: Deke Marquardt

||

Answers:

here is one way to do it

#create a temp seq, to help identify the last row where the values are changing
df['seq']=df.groupby(['Product','Location'],)['Product'].cumcount()


#concat the DF and the rows where the seq count is largest for the group
# this result in null values for non-selected columns
df2=pd.concat([df,
           df.groupby(['Product','Location'], as_index=False)['seq'].max()
          ],
          axis=0
).sort_values(['Product','Location']).reset_index()


#mask the rows with NaN where any value (like value columns) is null
df2=df2.mask(df2['Value'].isna())

# drop unwanted columns
df2.drop(columns=['index', 'seq'], inplace=True)

df2
Product     Location    Value   Cumulative Value
0   bear    Miami   4.0     27.0
1   bear    Miami   7.0     34.0
2   NaN     NaN     NaN     NaN
3   toy     Dallas  8.0     8.0
4   toy     Dallas  7.0     15.0
5   NaN     NaN     NaN     NaN
6   toy     Houston     3.0     18.0
7   toy     Houston     5.0     23.0
8   NaN     NaN     NaN     NaN
Answered By: Naveed
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.