Set value of one Pandas column based on value in another column

Question:

I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic:

if df['c1'] == 'Value':
    df['c2'] = 10
else:
    df['c2'] = df['c3']

I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).

If I try to run the code above or if I write it as a function and use the apply method, I get the following:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Asked By: NLR

||

Answers:

try:

df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)
Answered By: aggis

one way to do this would be to use indexing with .loc.

Example

In the absence of an example dataframe, I’ll make one up here:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5  Value
6      g

Assuming you wanted to create a new column c2, equivalent to c1 except where c1 is Value, in which case, you would like to assign it to 10:

First, you could create a new column c2, and set it to equivalent as c1, using one of the following two lines (they essentially do the same thing):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to 'Value' using .loc, and assign your desired value in c2 at those indices:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:

>>> df
      c1  c2
0      a   a
1      b   b
2      c   c
3      d   d
4      e   e
5  Value  10
6      g   g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5     10
6      g
Answered By: sacuL

You can use np.where() to set values based on a specified condition:

#df
   c1  c2  c3
0   4   2   1
1   8   7   9
2   1   5   8
3   3   3   5
4   3   6   8

Now change values (or set) in column ['c2'] based on your condition.

df['c2'] = np.where(df.c1 == 8,'X', df.c3)

   c1  c2  c3
0   4   1   1
1   8   X   9
2   1   8   8
3   3   5   5
4   3   8   8
Answered By: DJK

I suggest doing it in two steps:

# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10

# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']
Answered By: Ralf

You can use pandas.DataFrame.mask to add virtually as many conditions as you need:

data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}

d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}

d['new'] = np.nan
for value in c.values():
    d['new'].mask(d['a'] == value[0], value[1], inplace=True)

d['new'] = d['new'].fillna('Else')
d

Output:

    a   b   new
0   1   6   Else
1   2   8   Value1
2   3   9   Value2
3   4   10  Else
4   5   11  11
Answered By: nimbous

Try out df.apply() if you’ve a small/medium dataframe,

df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)

Else, follow the slicing techniques mentioned in the above comments if you’ve got a big dataframe.

Answered By: Rahul Bordoloi

Note the tilda that reverses the selection. It uses pandas methods (i.e. is faster than if/else).

df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']
Answered By: vkerov

I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it. Recall that you can set a column to a logical operator, so this works:

file['Flag'] = (file['Claim_Amount'] > 0)

This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.

Answered By: Mark

I believe Series.map() to be very readable and efficient, e.g.:

df["c2"] = df["c1"].map(lambda x: 10 if x == 'Value' else x)

I like it because if the conditional logic gets more complex you can move it to a function and just pass in that function instead of the lambda.

If you need to base your conditional logic on more than one column you can use DataFrame.apply() as others suggest.

Answered By: Pierre Monico

test of test (arasrfas)ASr asfasfasfasfalsknfasf

Answered By: Emancipated Forever
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.