Pandas: Check values between columns in different dataframes and return list of multiple possible values into a new column

Question:

I am trying to compare two columns from two different dataframes, and return all possible matches using python: (Kinda of an xlookup in excel but with multiple possible matches)

Please see the details below for sample dataframes and work I attempted.

An explanation of the datasets below: Mark does not own any cars, however, there are several listed under his name, which we know that none belong to him. I am attempting to look at dataframe 1 (Marks) and compare it against the larger dataset that has all other owners and their cars: dataframe 2 (claimed) and return possible owners for Mark’s cars as shown below.

Dataframe 1 : Marks
Marks = pd.DataFrame({'Car Brand': ['Jeep','Jeep','BMW','Volvo'],'Owner Name': ['Mark', 
'Mark', 'Mark', 'Mark']})

    Car Brand   Owner Name
    0   Jeep    Mark
    1   Jeep    Mark
    2   BMW     Mark
    3   Volvo   Mark

Dataframe 2: claimed

Dataframe 2: claimed
claimed = pd.DataFrame({'Car Brand': ['Dodge', 'Jeep', 'BMW', 'Merc', 'Volvo', 'Jeep', 
'Volvo'], 'Owner Name': ['Chris', 'Frank','Rob','Kelly','John','Chris','Kelly']})

    Car Brand   Owner Name
   0    Dodge   Chris
   1    Jeep    Frank
   2    BMW     Rob
   3    Merc    Kelly
   4    Volvo   John
   5    Jeep    Chris
   6    Volvo   Kelly

The data does have duplicate car brand names HOWEVER, the Owner Names are unique – meaning that Kelly even though she is mentioned twice IS THE SAME PERSON. same for Chris..etc

I want my Mark’s dataframe to have a new column that looks like this:

Car Brand   Owner Name  Possible Owners
0   Jeep    Mark        [Frank, Chris]
1   Jeep    Mark        [Frank, Chris]
2   BMW     Mark        Rob
3   Volvo   Mark        [John, Kelly]

I have tried the below codes:

possible_owners = list()
for cars in Marks['Car Brand']:
  for car_brands in claimed['Car Brand']:
     if Marks.loc[Marks['Car Brand'].isin(claimed['Car Brand'])]:
        sub = list()
        sub.append()
        possible_owners.append(sub)
    else:
        not_found = 'No possible Owners Identified'
        possible_owners.append(not_found)
   #Then I will add possible_owners as a new column to Marks

   error code:ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), 
   a.item(), a.any() or a.all().

I have also tried to do a merge, excel xlookup but (that has many limitations), and I am stuck trying to understand how to return possible matches even if there are multiple and line them up in one row.

Question: how can I compare the two frames, return possible values from the Owner Name column and put these values in a new column in Marks’ table?

Excuse my code, I am fairly new to Python.

Asked By: CatDad

||

Answers:

You could pre-process the claimed dataframe then merge:

lookup = claimed.groupby('Car Brand').apply(lambda x: x['Owner Name'].to_list()).to_frame()
df_m = Marks.merge(lookup, on='Car Brand', how='left').rename(columns={0:'Possible Owners'})
print(df_m)

Result

  Car Brand Owner Name Possible Owners
0      Jeep       Mark  [Frank, Chris]
1      Jeep       Mark  [Frank, Chris]
2       BMW       Mark           [Rob]
3     Volvo       Mark   [John, Kelly]
Answered By: jch

You can always use a list comprehension with the df.Series.isin to do the work.

result = [claimed[claimed['Car Brand'].isin([i])]['Owner Name'].to_numpy() for i in Marks['Car Brand']]

Marks['Possible Owners'] = result
 
  Car Brand Owner Name Possible Owners
0      Jeep       Mark  [Frank, Chris]
1      Jeep       Mark  [Frank, Chris]
2       BMW       Mark           [Rob]
3     Volvo       Mark   [John, Kelly]
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.