Using .loc and OR operator returning ValueError

Question:

I am trying to search for specific values in either of two columns and when a target value is found, change the number in a third column from positive to negative or negative to positive.

te1 = df.loc[df['Transaction Event'] == 'Exercise']
te2 = df.loc[df['Transaction Event'] == 'Assignment']
te3 = df.loc[df['Transaction Event'] == 'Expiration']
an1 = df.loc[df['Action'] == 'Delete']
nq = df['Net Quantity']
var1 = df[(df['Transaction Event'] == 'Exercise') | (df['Transaction Event'] == 'Assignment') | (df['Transaction Event'] == 'Expiration') | (df['Action'] == 'Delete')]

df.loc[df[var1], nq] = df.loc[df[var1], nq] * -1

Running this code returns the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-282-01dbb8066276> in <module>()
      6 var1 = df[(df['Transaction Event'] == 'Exercise') | (df['Transaction Event'] == 'Assignment') | (df['Transaction Event'] == 'Expiration') | (df['Action'] == 'Delete')]
      7 
----> 8 df.loc[df[var1], nq] = df.loc[df[var1], nq] * -1
      9 print(df)

C:ProgramDataAnaconda3libsite-packagespandascoreframe.py in __getitem__(self, key)
   1958             return self._getitem_array(key)
   1959         elif isinstance(key, DataFrame):
-> 1960             return self._getitem_frame(key)
   1961         elif is_mi_columns:
   1962             return self._getitem_multilevel(key)

C:ProgramDataAnaconda3libsite-packagespandascoreframe.py in _getitem_frame(self, key)
   2034         if key.values.size and not is_bool_dtype(key.values):
   2035             raise ValueError('Must pass DataFrame with boolean values only')
-> 2036         return self.where(key)
   2037 
   2038     def query(self, expr, inplace=False, **kwargs):

C:ProgramDataAnaconda3libsite-packagespandascoregeneric.py in where(self, cond, other, inplace, axis, level, try_cast, raise_on_error)
   5338         other = com._apply_if_callable(other, self)
   5339         return self._where(cond, other, inplace, axis, level, try_cast,
-> 5340                            raise_on_error)
   5341 
   5342     @Appender(_shared_docs['where'] % dict(_shared_doc_kwargs, cond="False",

C:ProgramDataAnaconda3libsite-packagespandascoregeneric.py in _where(self, cond, other, inplace, axis, level, try_cast, raise_on_error)
   5096             for dt in cond.dtypes:
   5097                 if not is_bool_dtype(dt):
-> 5098                     raise ValueError(msg.format(dtype=dt))
   5099 
   5100         cond = cond.astype(bool, copy=False)

ValueError: Boolean array expected for the condition, not float64

Does anyone know what is causing this error?

Asked By: tacpdt

||

Answers:

You’re not creating a mask, you’re selecting a subset of your df when you do this:

var1 = df[(df['Transaction Event'] == 'Exercise') | (df['Transaction Event'] == 'Assignment') | (df['Transaction Event'] == 'Expiration') | (df['Action'] == 'Delete')]

Instead you need just this:

var1 = (df['Transaction Event'] == 'Exercise') | (df['Transaction Event'] == 'Assignment') | (df['Transaction Event'] == 'Expiration') | (df['Action'] == 'Delete')

In your current code you create the boolean array that you want, but also additionally index in to your original df with that array. You can confirm if you look at what’s actually contained in var1 for your current code.

Answered By: dan_g