Set value of one Pandas column based on value in another column
Question:
I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:
if df['c1'] == 'Value':
df['c2'] = 10
else:
df['c2'] = df['c3']
I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).
If I try to run the code above or if I write it as a function and use the apply method, I get the following:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Answers:
try:
df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
one way to do this would be to use indexing with .loc
.
Example
In the absence of an example dataframe, I’ll make one up here:
import numpy as np
import pandas as pd
df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'
>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 Value
6 g
Assuming you wanted to create a new column c2
, equivalent to c1
except where c1
is Value
, in which case, you would like to assign it to 10:
First, you could create a new column c2
, and set it to equivalent as c1
, using one of the following two lines (they essentially do the same thing):
df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']
Then, find all the indices where c1
is equal to 'Value'
using .loc
, and assign your desired value in c2
at those indices:
df.loc[df['c1'] == 'Value', 'c2'] = 10
And you end up with this:
>>> df
c1 c2
0 a a
1 b b
2 c c
3 d d
4 e e
5 Value 10
6 g g
If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:
df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10
Giving you:
>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 10
6 g
You can use np.where()
to set values based on a specified condition:
#df
c1 c2 c3
0 4 2 1
1 8 7 9
2 1 5 8
3 3 3 5
4 3 6 8
Now change values (or set) in column ['c2']
based on your condition.
df['c2'] = np.where(df.c1 == 8,'X', df.c3)
c1 c2 c3
0 4 1 1
1 8 X 9
2 1 8 8
3 3 5 5
4 3 8 8
I suggest doing it in two steps:
# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10
# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
You can use pandas.DataFrame.mask
to add virtually as many conditions as you need:
data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}
d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}
d['new'] = np.nan
for value in c.values():
d['new'].mask(d['a'] == value[0], value[1], inplace=True)
d['new'] = d['new'].fillna('Else')
d
Output:
a b new
0 1 6 Else
1 2 8 Value1
2 3 9 Value2
3 4 10 Else
4 5 11 11
Try out df.apply() if you’ve a small/medium dataframe,
df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)
Else, follow the slicing techniques mentioned in the above comments if you’ve got a big dataframe.
Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if
/else
).
df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']
I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:
file['Flag'] = (file['Claim_Amount'] > 0)
This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.
I believe Series.map() to be very readable and efficient, e.g.:
df["c2"] = df["c1"].map(lambda x: 10 if x == 'Value' else x)
I like it because if the conditional logic gets more complex you can move it to a function and just pass in that function instead of the lambda.
If you need to base your conditional logic on more than one column you can use DataFrame.apply() as others suggest.
test of test (arasrfas)ASr asfasfasfasfalsknfasf
I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:
if df['c1'] == 'Value':
df['c2'] = 10
else:
df['c2'] = df['c3']
I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).
If I try to run the code above or if I write it as a function and use the apply method, I get the following:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
try:
df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
one way to do this would be to use indexing with .loc
.
Example
In the absence of an example dataframe, I’ll make one up here:
import numpy as np
import pandas as pd
df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'
>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 Value
6 g
Assuming you wanted to create a new column c2
, equivalent to c1
except where c1
is Value
, in which case, you would like to assign it to 10:
First, you could create a new column c2
, and set it to equivalent as c1
, using one of the following two lines (they essentially do the same thing):
df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']
Then, find all the indices where c1
is equal to 'Value'
using .loc
, and assign your desired value in c2
at those indices:
df.loc[df['c1'] == 'Value', 'c2'] = 10
And you end up with this:
>>> df
c1 c2
0 a a
1 b b
2 c c
3 d d
4 e e
5 Value 10
6 g g
If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:
df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10
Giving you:
>>> df
c1
0 a
1 b
2 c
3 d
4 e
5 10
6 g
You can use np.where()
to set values based on a specified condition:
#df
c1 c2 c3
0 4 2 1
1 8 7 9
2 1 5 8
3 3 3 5
4 3 6 8
Now change values (or set) in column ['c2']
based on your condition.
df['c2'] = np.where(df.c1 == 8,'X', df.c3)
c1 c2 c3
0 4 1 1
1 8 X 9
2 1 8 8
3 3 5 5
4 3 8 8
I suggest doing it in two steps:
# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10
# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
You can use pandas.DataFrame.mask
to add virtually as many conditions as you need:
data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}
d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}
d['new'] = np.nan
for value in c.values():
d['new'].mask(d['a'] == value[0], value[1], inplace=True)
d['new'] = d['new'].fillna('Else')
d
Output:
a b new
0 1 6 Else
1 2 8 Value1
2 3 9 Value2
3 4 10 Else
4 5 11 11
Try out df.apply() if you’ve a small/medium dataframe,
df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)
Else, follow the slicing techniques mentioned in the above comments if you’ve got a big dataframe.
Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if
/else
).
df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']
I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:
file['Flag'] = (file['Claim_Amount'] > 0)
This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.
I believe Series.map() to be very readable and efficient, e.g.:
df["c2"] = df["c1"].map(lambda x: 10 if x == 'Value' else x)
I like it because if the conditional logic gets more complex you can move it to a function and just pass in that function instead of the lambda.
If you need to base your conditional logic on more than one column you can use DataFrame.apply() as others suggest.
test of test (arasrfas)ASr asfasfasfasfalsknfasf