Pandas different mathematical operation, conditional on column value

Question:

data= {'start_value':[10,20,30,40,50,60,70],
'identifier':['+','+','-','-','+','-','-']}
df = pd.DataFrame(data)
start_value identifier
0   10  +
1   20  +
2   30  -
3   40  -
4   50  +
5   60  -
6   70  -

I am attempting to created a new column "end_value" that results in either +5 or -5 to the *"*start_value" column based on the "+" or "-" value in the "identifier" column. Resulting in the df below.

start_value identifier  end_value
0   10  +   15.0
1   20  +   25.0
2   30  -   25.0
3   40  -   35.0
4   50  +   55.0
5   60  -   55.0
6   70  -   65.0

Running this code I realize replaces the values in the "end_value" column, resulting in this df

df['end_value'] = 5 + df.loc[df['identifier']=="+"]['start_value']
df['end_value'] = -5 + df.loc[df['identifier']=="-"]['start_value']
start_value identifier  end_value
0   10  +   NaN
1   20  +   NaN
2   30  -   25.0
3   40  -   35.0
4   50  +   NaN
5   60  -   55.0
6   70  -   65.0

How would I apply an if statement to combine the results where 5 is added if the identifier col == "+" and 5 is subtracted if the identifier col == "-" ?

I’ve done something similar with strings using this post below, but I am unsure how to successfully apply this for a mathematical operation resulting in ‘end_value’ dtype as float.

Pandas: if row in column A contains "x", write "y" to row in column B

Asked By: user8421

||

Answers:

You could use .apply() with a lambda expression.

data= {'start_value':[10,20,30,40,50,60,70],
'identifier':['+','+','-','-','+','-','-']}
df = pd.DataFrame(data)
df["end_value"] = df.apply(lambda row: row.start_value + 5 if row.identifier == "+" else row.start_value - 5, axis=1)

assuming that the values of the idetifier column are either + or -

Answered By: TimbowSix

You can use vectorized operation:

import numpy as np

df['end_value'] = df['start_value'] + np.where(df['identifier'] == '+', 5, -5)

# OR

df['end_value'] = df['start_value'] + df['identifier'].replace({'+': 5, '-': 5})
print(df)

# Output
   start_value identifier  end_value
0           10          +         15
1           20          +         25
2           30          -         25
3           40          -         35
4           50          +         55
5           60          -         55
6           70          -         65
Answered By: Corralien