Searching Dataframe Using Two Criteria

Question:

I have a dataframe called test_df (below) and I am trying to search for the ID based on two criteria: the Factor and the name.

Here’s my code:

test_id = test_df.loc[test_df['Name'].str.contains(test_name, case=False) & test_df['Factor'].str.contains(test_factor, case=False), 'ID'].item()

But I get the following error:

Traceback (most recent call last):
  File "C:/Users/v3.py", line 508, in <module>
    test_id = test_df.loc[test_df['Name'].str.contains(test_name, case=False) & test_df['Factor'].str.contains(test_factor, case=False), 'ID'].item()
  File "C:UsersAppDataLocalProgramsPythonPython37libsite-packagespandascorebase.py", line 331, in item
    raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar

I thought the issue was the Factor column being a float format but I have converted it to a string with the same result.

Can anyone see what the problem is?

FIELD1 ID Factor Status Speed Name
0 49513622 -4 ACTIVE II
1 11193741 4 ACTIVE AP
2 49513622 -3.75 ACTIVE II
3 11193741 3.75 ACTIVE AP
4 49513622 -3.5 ACTIVE II
5 11193741 3.5 ACTIVE AP
6 49513622 -3.25 ACTIVE II
7 11193741 3.25 ACTIVE AP
8 49513622 -3 ACTIVE II
9 11193741 3 ACTIVE AP
10 49513622 -2.75 ACTIVE II
11 11193741 2.75 ACTIVE AP
12 49513622 -2.5 ACTIVE II
13 11193741 2.5 ACTIVE AP
14 49513622 -2.25 ACTIVE II
15 11193741 2.25 ACTIVE AP
16 49513622 -2 ACTIVE II
17 11193741 2 ACTIVE AP
18 49513622 -1.75 ACTIVE II
19 11193741 1.75 ACTIVE AP
20 49513622 -1.5 ACTIVE II
21 11193741 1.5 ACTIVE AP
22 49513622 -1.25 ACTIVE II
23 11193741 1.25 ACTIVE AP
24 49513622 -1 ACTIVE II
25 11193741 1 ACTIVE AP
26 49513622 -0.75 ACTIVE II
27 11193741 0.75 ACTIVE 1.02 AP
28 49513622 -0.5 ACTIVE II
29 11193741 0.5 ACTIVE AP
30 49513622 -0.25 ACTIVE II
31 11193741 0.25 ACTIVE AP
32 49513622 0 ACTIVE 2.68 II
33 11193741 0 ACTIVE 1.03 AP
34 49513622 0.25 ACTIVE II
35 11193741 -0.25 ACTIVE 1.99 AP
36 49513622 0.5 ACTIVE II
37 11193741 -0.5 ACTIVE 2.3 AP
38 49513622 0.75 ACTIVE II
39 11193741 -0.75 ACTIVE AP
40 49513622 1 ACTIVE II
41 11193741 -1 ACTIVE AP
42 49513622 1.25 ACTIVE II
43 11193741 -1.25 ACTIVE AP
44 49513622 1.5 ACTIVE II
45 11193741 -1.5 ACTIVE AP
46 49513622 1.75 ACTIVE II
47 11193741 -1.75 ACTIVE AP
48 49513622 2 ACTIVE II
49 11193741 -2 ACTIVE AP
50 49513622 2.25 ACTIVE II
51 11193741 -2.25 ACTIVE AP
52 49513622 2.5 ACTIVE II
53 11193741 -2.5 ACTIVE AP
54 49513622 2.75 ACTIVE II
55 11193741 -2.75 ACTIVE AP
56 49513622 3 ACTIVE II
57 11193741 -3 ACTIVE AP
58 49513622 3.25 ACTIVE II
59 11193741 -3.25 ACTIVE AP
60 49513622 3.5 ACTIVE II
61 11193741 -3.5 ACTIVE AP
62 49513622 3.75 ACTIVE II
63 11193741 -3.75 ACTIVE AP
64 49513622 4 ACTIVE II
65 11193741 -4 ACTIVE AP
Asked By: Andy

||

Answers:

Problem is no match any value, so DataFrame.loc return empty Series.

Possible solution is use next + iter for assign default value if no match:

mask = test_df['Name'].str.contains(test_name, case=False) & 
       test_df['Factor'].str.contains(test_factor, case=False)

#if no match assign 'no match'
test_id = next(iter(test_df.loc[mask, 'ID']), 'no match')
#if no match assign None
test_id = next(iter(test_df.loc[mask, 'ID']),  None)

Or use if-else with test if at least one value match:

test_id = test_df.loc[mask, 'ID'].item() if m.any() else 'no match'

Or:

if m.any():
    test_id = test_df.loc[mask, 'ID'].item()

EDIT: For testing si possible create helper columns:

m1 = test_df['Name'].str.contains(test_name, case=False)
m2 = test_df['Factor'].str.contains(test_factor, case=False)

test_df = test_df.assign(name_mask = m1, factor_mask = m1, both = m1 & m2)
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.