How to create dummies for certain columns with pandas.get_dummies()

Question:

df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': ['z', 'u', 'z'],
                  'C': ['1', '2', '3'],
                  'D':['j', 'l', 'j']})

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df), all columns turned into dummies.

I want the final result containing all of columns , which means column C and column B exit,like 'A_x','A_y','B','C','D_j','D_l'.

Asked By: Jack

||

Answers:

Just select the two columns you want to .get_dummies() for – column names indicate source column and variable label represented as binary variable, and pd.concat() the original columns you want unchanged:

pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)

   A_x  A_y  D_j  D_l  B  C
0  1.0  0.0  1.0  0.0  z  1
1  0.0  1.0  0.0  1.0  u  2
2  1.0  0.0  1.0  0.0  z  3
Answered By: Stefan

It can be done without concatenation, using get_dummies() with required parameters

In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]: 
   B  C  A_x  A_y  D_j  D_l
0  z  1  1.0  0.0  1.0  0.0
1  u  2  0.0  1.0  0.0  1.0
2  z  3  1.0  0.0  1.0  0.0
Answered By: knagaev

Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don’t want to specify by hand all of the dummies you want, you can do set differences:

len(df.columns) = 50
non_dummy_cols = ['A','B','C'] 
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)
Answered By: Patric Fulop
  • The other answers are great for the specific example in the OP
  • This answer is for cases where there may be many columns, and it’s too cumbersome to type out all the column names
  • This is a non-exhaustive solution to specifying many different columns to get_dummies while excluding some columns.
  • Using the built-in filter() function on df.columns is also an option.
  • pd.get_dummies only works on columns with an object dtype when columns=None.
    • Another potential option is to set only columns to be transformed with the object dtype, and make sure the columns that shouldn’t be transformed, are not object dtype.
  • Using set(), as shown in this answer, is yet another option.
import pandas as pd
import string  # for data
import numpy as np

# create test data
np.random.seed(15)
df = pd.DataFrame(np.random.randint(1, 4, size=(5, 10)), columns=list(string.ascii_uppercase[:10]))

# display(df)
   A  B  C  D  E  F  G  H  I  J
0  1  2  1  2  1  1  2  3  2  2
1  2  1  3  3  1  2  2  1  2  1
2  2  3  1  3  2  2  1  2  3  3
3  3  2  1  2  3  2  3  1  3  1
4  1  1  1  3  3  1  2  1  2  1

Option 1

  • If the excluded columns are fewer than the included columns, specify the columns to remove, and then use a list comprehension to remove them from the list being passed to the columns= parameter.
# columns not to transform
not_cols = ['C', 'G']

# get dummies
df_dummies = pd.get_dummies(data=df, columns=[col for col in df.columns if col not in not_cols])

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

Option 2

  • If the columns to remove are at the beginning or end, slice df.columns
df_dummies = pd.get_dummies(data=df, columns=df.columns[2:])

   A  B  C_1  C_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  G_1  G_2  G_3  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    1    0    1    0    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  2  1    0    1    0    1    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0
2  2  3    1    0    0    1    0    1    0    0    1    1    0    0    0    1    0    0    1    0    0    1
3  3  2    1    0    1    0    0    0    1    0    1    0    0    1    1    0    0    0    1    1    0    0
4  1  1    1    0    0    1    0    0    1    1    0    0    1    0    1    0    0    1    0    1    0    0

Option 3

  • Specify slices and then concat the excluded columns to the dummies
    • Uses pd.concat, similar to this answer, but with more columns.
  • np.r_ translates slice objects to concatenate
slices = np.r_[slice(0, 2), slice(3, 6), slice(7, 10)]
excluded = [2, 6]

df_dummies = pd.concat([df.iloc[:, excluded], pd.get_dummies(data=df.iloc[:, slices].astype(object))], axis=1)

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0
Answered By: Trenton McKinney
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.