Create "Yes" column according to another column value pandas dataframe

Question:

Imagine I have a dataframe with employee IDs, their Contract Number, and the Company they work for. Each employee can have as many contracts as they want for the same company or even for different companies:

ID  Contract Number Company
10000   1           Abc
10000   2           Zxc
10000   3           Abc
10001   1           Zxc
10002   2           Abc
10002   1           Cde
10002   3           Zxc

I need to find a way to identify the company of the contract number "1" per each ID and then create a column "Primary Contract" that would be set to "Yes" if the contract is in the same company as the company of contract number 1 resulting on this dataframe:

ID  Contract Number Company Primary Compay
10000   1            Abc           Yes
10000   2            Zxc           No
10000   3            Abc           Yes
10001   1            Zxc           Yes
10002   2            Abc           No
10002   1            Cde           Yes
10002   3            Zxc           No

What would be the best way to achieve it?

Asked By: Paulo Cortez

||

Answers:

You can use groupby.apply with isin and numpy.where:

df['Primary Company'] = np.where(
 df.groupby('ID', group_keys=False)
   .apply(lambda g: g['Company'].isin(g.loc[g['Contract Number'].eq(1), 'Company'])
         ),
 'Yes', 'No'
)

Output:

      ID  Contract Number Company Primary Company
0  10000                1     Abc             Yes
1  10000                2     Zxc              No
2  10000                3     Abc             Yes
3  10001                1     Zxc             Yes
4  10002                2     Abc              No
5  10002                1     Cde             Yes
6  10002                3     Zxc              No

If you can just use a boolean (True/False) instead of 'Yes'/'No':

df['Primary Company'] = (
 df.groupby('ID', group_keys=False)
   .apply(lambda g: g['Company'].isin(g.loc[g['Contract Number'].eq(1), 'Company']))
)
Answered By: mozway

Filter rows with Contract Number is 1, use left join in DataFrame.merge and compare _merge column generated by indicator=True parameter:

mask = (df.merge(df[df['Contract Number'].eq(1)],
                how='left', on=['ID','Company'], indicator=True)['_merge'].eq('both'))
df['Primary Company'] = np.where(mask, 'Yes','No')
print (df)
      ID  Contract Number Company Primary Company
0  10000                1     Abc             Yes
1  10000                2     Zxc              No
2  10000                3     Abc             Yes
3  10001                1     Zxc             Yes
4  10002                2     Abc              No
5  10002                1     Cde             Yes
6  10002                3     Zxc              No

Another idea is with compare MultiIndex by Index.isin:

idx = df[df['Contract Number'].eq(1)].set_index(['ID','Company']).index
df['Primary Company'] = np.where(df.set_index(['ID','Company']).index.isin(idx),
                                 'Yes','No')
print (df)
      ID  Contract Number Company Primary Company
0  10000                1     Abc             Yes
1  10000                2     Zxc              No
2  10000                3     Abc             Yes
3  10001                1     Zxc             Yes
4  10002                2     Abc              No
5  10002                1     Cde             Yes
6  10002                3     Zxc              No
Answered By: jezrael