Filter pandas dataframe column and replace values using a list condition

Question:

I have the following dataframe:

ID  Type                Job
1   Employee            Doctor
2   Contingent Worker   Doctor
3   Employee            Employee
4   Employee            Employee
5   Contingent Worker   Employee
6   Contingent Worker   Consultant
7   Contingent Worker   Trainee
8   Contingent Worker   SSS
9   Contingent Worker   Agency Worker
10  Contingent Worker   

And I have this list of possible acceptable values for everyone that has a type of Contingent Workers:

list = ['Agency Worker', 'Consultant']

I need to find a way to confirm if everyone under the type "Contingent Worker" have an accetpable value in "Job" and, if not (or blank value), replace that value for "Consultant" resulting in this dataframe:

ID  Type                Job
1   Employee            Doctor
2   Contingent Worker   Consultant
3   Employee            Employee
4   Employee            Employee
5   Contingent Worker   Consultant
6   Contingent Worker   Consultant
7   Contingent Worker   Consultant
8   Contingent Worker   Consultant
9   Contingent Worker   Agency Worker
10  Contingent Worker   Consultant

What would be the best way to achieve this result?

Asked By: Paulo Cortez

||

Answers:

You could np.where for this.

  • Select only the slice of relevant entries for column Job (so: df['Type']=='Contingent Worker' inside df.loc) and use Series.isin to check for each string whether it is found inside your list (here: jobs). If so, we return the associated value, else, we return "Consultant".
jobs = ['Agency Worker', 'Consultant']

df.loc[df['Type']=='Contingent Worker','Job'] = np.where(
    df.loc[df['Type']=='Contingent Worker','Job'].isin(jobs),
    df.loc[df['Type']=='Contingent Worker','Job'], 
    'Consultant')

print(df)

   ID               Type            Job
0   1           Employee         Doctor
1   2  Contingent Worker     Consultant
2   3           Employee       Employee
3   4           Employee       Employee
4   5  Contingent Worker     Consultant
5   6  Contingent Worker     Consultant
6   7  Contingent Worker     Consultant
7   8  Contingent Worker     Consultant
8   9  Contingent Worker  Agency Worker
9  10  Contingent Worker     Consultant

N.B. Please don’t use "list" as a variable name. Since list is a built-in data type in Python, doing so will overwrite its functionality. E.g.:

print(type(list))
<class 'type'>

list = ['Agency Worker', 'Consultant']
<class 'list'>

lst = list()
# will now throw an error:
    TypeError: 'list' object is not callable
Answered By: ouroboros1

I would do it following way

df.loc[(df.Type=='Contingent Worker') & ~df.Job.isin(['Agency Worker', 'Consultant']),'Job'] = 'Consultant'
print(df)

gives output

                 Type            Job
ID
1            Employee         Doctor
2   Contingent Worker     Consultant
3            Employee       Employee
4            Employee       Employee
5   Contingent Worker     Consultant
6   Contingent Worker     Consultant
7   Contingent Worker     Consultant
8   Contingent Worker     Consultant
9   Contingent Worker  Agency Worker
10  Contingent Worker     Consultant

Explanation: select such rows where Type is Contingent Worker and (&) Job is not (~) one of values (isin) from your list, select Job column, set value to Consultant.

Answered By: Daweo