Target encoding multiple columns in pandas python

Question

I have the following table.

id col1 col2 col3 col4  target
1    A    B  A    101   1
2    B    B  A    191   1
3    A    B  A     81   0 
4    C    B  C     67   1
5    B    C  C      3   0

I want to target encode every column except col4.

Expected Output:

e1    e2     e3     target
0.5   0.75   0.667    1
0.5   0.75   0.667    1
0.5   0.75   0.667    0
1.0   0.75   0.5      1
0.5   0.00   0.5      0

EDIT:
For each column of col1, col2, col3 I want to get the target encodings.

For example, in col3, A appears 3 times and 2/3 times it has a target of 1. thus the encoding will be 0.667 for A. Similarly for C it will be 0.5 in col3.

I’ve tried something like this one for one column:

encodings = df.groupby('col1')['target'].mean().reset_index()
df = df.merge(encodings, how = 'left', on = 'col1')
df.drop('col1', axis = 1, inplace = TRUE)

Asked By: Eisen

||

Source

Answer 1

update after clarification:

You need to use the same approach as in your original attempt, but using map

df.update(df[['col1', 'col2', 'col3']]
          .apply(lambda s: s.map(df['target'].groupby(s).mean()))
          )

output:

   id col1  col2      col3  col4  target
0   1  0.5  0.75  0.666667   101       1
1   2  0.5  0.75  0.666667   191       1
2   3  0.5  0.75  0.666667    81       0
3   4  1.0  0.75       0.5    67       1
4   5  0.5   0.0       0.5     3       0

older answer prior to OP clarification

IIUC, you want to map the normalized value_counts:

df[['col1', 'col2', 'col3']].apply(lambda s: s.map(s.value_counts(normalize=True)))

output:

   col1  col2  col3
0   0.4   0.8   0.6
1   0.4   0.8   0.6
2   0.4   0.8   0.6
3   0.2   0.8   0.4
4   0.4   0.2   0.4

updating the data in place:

df.update(df[['col1', 'col2', 'col3']]
          .apply(lambda s: s.map(s.value_counts(normalize=True)))
          )

updated DataFrame:

   id col1 col2 col3  col4  target
0   1  0.4  0.8  0.6   101       1
1   2  0.4  0.8  0.6   191       1
2   3  0.4  0.8  0.6    81       0
3   4  0.2  0.8  0.4    67       1
4   5  0.4  0.2  0.4     3       0

Answered By: mozway

Answer 2

You may can try with transform with for loop

l = [df.groupby(col)['target'].transform('mean') for col in ['col1','col2','col3']]
out = pd.concat(l + [df.target],keys = ['e1','e2','e3','target'],axis=1)
out
Out[247]: 
    e1    e2        e3  target
0  0.5  0.75  0.666667       1
1  0.5  0.75  0.666667       1
2  0.5  0.75  0.666667       0
3  1.0  0.75  0.500000       1
4  0.5  0.00  0.500000       0

Answered By: BENY

Answer 3

Use .apply. For each column – calculate the average of target grouped by this column:

df[['col1', 'col2', 'col3']].apply(lambda s: s.map(df['target'].groupby(s).mean()))

   col1  col2      col3
0   0.5  0.75  0.666667
1   0.5  0.75  0.666667
2   0.5  0.75  0.666667
3   1.0  0.75  0.500000
4   0.5  0.00  0.500000

If you also want to have a target column, you can just use .assign() at the end:

df[['col1', 'col2', 'col3']].apply(lambda s: s.map(df['target'].groupby(s).mean())).assign(target=df['target'])

   col1  col2      col3  target
0   0.5  0.75  0.666667       1
1   0.5  0.75  0.666667       1
2   0.5  0.75  0.666667       0
3   1.0  0.75  0.500000       1
4   0.5  0.00  0.500000       0

Note: .apply() and .transform() give identical results here. You can replace one with the other.

Answered By: Vladimir Fokow

Answer 4

pd.concat([df1[col].map(pd.crosstab(df1[col],df1.target,normalize='index')[1]) for col in ['col1','col2','col3']],axis=1).join(df1.target)
    
      col1  col2      col3  target
    0   0.5  0.75  0.666667       1
    1   0.5  0.75  0.666667       1
    2   0.5  0.75  0.666667       0
    3   1.0  0.75  0.500000       1
    4   0.5  0.00  0.500000       0

Answered By: G.G

Target encoding multiple columns in pandas python

Question:

Answers:

update after clarification:

older answer prior to OP clarification

updating the data in place: