Pandas dataframe "ValueError: The truth value of a Series is ambiguous" when using .apply

Question:

I have a dataframe which has 3884 rows × 4458 columns and filled with numbers. I’m trying to equate numbers greater than 1 to 1. I have tried

df.apply(lambda x: 1 if x >= 1 else 0)

Or I tried to make function but I’m getting this error.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I checked previous topics and see many questions about this but I really don’t understand.

Asked By: Berk Ak

||

Answers:

It is because x >= 1 returns a series of True/False values based on the original numeric value, because x is a series representing a column inside your lambda.

You could use (x >= 1).all() or any() or such but that won’t suit your needs.

Instead you may use the below to transform each value in the df:

df.apply(lambda x : [1 if e >= 1 else 0 for e in x])
Answered By: SomeDude

pandas.DataFrame.apply will apply a function along an axis of the DataFrame. Thus x in your lambda is a Series which couldn’t be feed into if after comparing.

I’m trying to equate numbers greater than 1 to 1.

You can use pandas.DataFrame.applymap which apply a function to a Dataframe elementwise.

df.applymap(lambda x: 1 if x >= 1 else 0)
Answered By: Ynjxsjmh

Ynjxsjmh’s answer is good, but in this situation, you don’t actually need to use .apply() in the first place. Pandas and NumPy have more powerful tools for doing this sort of thing. Here are some examples.

Firstly, some example data:

>>> df = pd.DataFrame({'a': [-1, 0, 1, 2], 'b': [0, 0, 5, -2]})
>>> df
   a  b
0 -1  0
1  0  0
2  1  5
3  2 -2

If like you say, you want to cap numbers at 1, you could use:

  • .clip():

    >>> df.clip(upper=1)
       a  b
    0 -1  0
    1  0  0
    2  1  1
    3  1 -2
    
  • .mask():

    >>> df.mask(df>=1, 1)
       a  b
    0 -1  0
    1  0  0
    2  1  1
    3  1 -2
    

Or if like your code says, you also want to make numbers 0 if they’re less than 1, you could use:

  • A comparison on the whole dataframe, then convert the bools to int:

    >>> df.ge(1).astype(int)
       a  b
    0  0  0
    1  0  0
    2  1  1
    3  1  0
    
  • numpy.where():

    >>> df[:] = np.where(df>=1, 1, 0)
    >>> df
       a  b
    0  0  0
    1  0  0
    2  1  1
    3  1  0
    
Answered By: wjandrea
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.