pandas comparison raises TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
Question:
I have the following structure to my dataFrame:
Index: 1008 entries, Trial1.0 to Trial3.84
Data columns (total 5 columns):
CHUNK_NAME 1008 non-null values
LAMBDA 1008 non-null values
BETA 1008 non-null values
HIT_RATE 1008 non-null values
AVERAGE_RECIPROCAL_HITRATE 1008 non-null values
chunks=['300_321','322_343','344_365','366_387','388_408','366_408','344_408','322_408','300_408']
lam_beta=[(lambda1,beta1),(lambda1,beta2),(lambda1,beta3),...(lambda1,beta_n),(lambda2,beta1),(lambda2,beta2)...(lambda2,beta_n),........]
my_df.ix[my_df.CHUNK_NAME==chunks[0]&my_df.LAMBDA==lam_beta[0][0]]
I want to get the rows of the DataFrame for a particular chunk lets say chunks[0]
and particular lambda
value. So in this case, the output should be all rows in the DataFrame having CHUNK_NAME='300_321'
and LAMBDA=lambda1
. There would be n rows one for each beta
value that would be returned. But instead I get the following error. Any help in solving this problem would be appreciated.
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
Answers:
&
has higher precedence than ==
. Write:
my_df.ix[(my_df.CHUNK_NAME==chunks[0])&(my_df.LAMBDA==lam_beta[0][0])]
^ ^ ^ ^
One way to make sure you don’t get into trouble with operator precedence is to use the wrapper methods of comparison operators. For example, use eq
method instead of the ==
operator.
Other wrappers are:
ne
: !=
le
: <=
lt
: <
ge
: >=
gt
: >
So the expression in OP would be:
my_df.loc[my_df.CHUNK_NAME.eq(chunks[0]) & my_df.LAMBDA.eq(lam_beta[0][0])]
The wrappers can do more than the comparison operators. You can choose the axis along which to compare. Also, if you’re dealing with a MultiIndex object, you can choose the level.
Example:
For df
:
a b c
0 1 3 5.0
1 2 4 6.0
the following line:
out = df.loc[df['a']<3 & df['c']==5]
results in the following error:
> TypeError: Cannot perform 'rand_' with a dtyped [float64] array and
> scalar of type [bool]
However, if we use the equivalent wrappers:
out = df.loc[df['a'].lt(3) & df['c'].eq(5)])
Output:
a b c
0 1 3 5.0
I have the following structure to my dataFrame:
Index: 1008 entries, Trial1.0 to Trial3.84
Data columns (total 5 columns):
CHUNK_NAME 1008 non-null values
LAMBDA 1008 non-null values
BETA 1008 non-null values
HIT_RATE 1008 non-null values
AVERAGE_RECIPROCAL_HITRATE 1008 non-null values
chunks=['300_321','322_343','344_365','366_387','388_408','366_408','344_408','322_408','300_408']
lam_beta=[(lambda1,beta1),(lambda1,beta2),(lambda1,beta3),...(lambda1,beta_n),(lambda2,beta1),(lambda2,beta2)...(lambda2,beta_n),........]
my_df.ix[my_df.CHUNK_NAME==chunks[0]&my_df.LAMBDA==lam_beta[0][0]]
I want to get the rows of the DataFrame for a particular chunk lets say chunks[0]
and particular lambda
value. So in this case, the output should be all rows in the DataFrame having CHUNK_NAME='300_321'
and LAMBDA=lambda1
. There would be n rows one for each beta
value that would be returned. But instead I get the following error. Any help in solving this problem would be appreciated.
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
&
has higher precedence than ==
. Write:
my_df.ix[(my_df.CHUNK_NAME==chunks[0])&(my_df.LAMBDA==lam_beta[0][0])]
^ ^ ^ ^
One way to make sure you don’t get into trouble with operator precedence is to use the wrapper methods of comparison operators. For example, use eq
method instead of the ==
operator.
Other wrappers are:
ne
:!=
le
:<=
lt
:<
ge
:>=
gt
:>
So the expression in OP would be:
my_df.loc[my_df.CHUNK_NAME.eq(chunks[0]) & my_df.LAMBDA.eq(lam_beta[0][0])]
The wrappers can do more than the comparison operators. You can choose the axis along which to compare. Also, if you’re dealing with a MultiIndex object, you can choose the level.
Example:
For df
:
a b c
0 1 3 5.0
1 2 4 6.0
the following line:
out = df.loc[df['a']<3 & df['c']==5]
results in the following error:
> TypeError: Cannot perform 'rand_' with a dtyped [float64] array and
> scalar of type [bool]
However, if we use the equivalent wrappers:
out = df.loc[df['a'].lt(3) & df['c'].eq(5)])
Output:
a b c
0 1 3 5.0