Convert Categorical into Numerical Python Panda Columns must be same length as key

Question:

I’m a novice at Python and start to learn Data Science using Python. Please help with my problem.

I’m following this youtube video tutorial: https://www.youtube.com/watch?v=DPeTCC9Qm80 to convert categorical data into numerical data (one hot encoding).

The problem is when I want to replace the column (e.g Job) with the encoding it was an error:

ValueError: Columns must be same length as key

I using the data from kaggle: https://www.kaggle.com/datasets/krantiswalke/bankfullcsv?resource=download

here is the code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path="bank-full.csv"
df = pd.read_csv(path)

df_cat=df.select_dtypes(object)
df_num=df.select_dtypes(np.number)

df_cat['job']=pd.get_dummies(df_cat['job'], drop_first=True)

I know what cause the error because ‘job’ column will be replace with 11 columns result from one hot encoding (blue-collar, entrepreneur, housmaid, etc..)

how to solve this error?

I also try to encoding all the categorical column with this code:

cols=list(df_cat.columns) def cat_2_num(cols): for col in cols: df_cat[col]=pd.get_dummies(df_cat[col], dro_first=True) cat_2_num(cols)

but the dtypes still object not converted to uint8 (number)

I try manually code like this for ‘job’ column:

df_job = pd.get_dummies(df_cat['job'], drop_first=True)
df_new = pd.concat([df,df_job], axis=1)
df_new = df_new.drop('job', axis=1)

I drop ‘job’ column and append ‘job’ hot encoding result. How to do it efficiently especially for all categorical columns?

Asked By: Jacky Brian

||

Answers:

You do not need to specify one column at a time to do this. pd.get_dummies will automatically drop your string columns and replace them with one-hot encoded columns if you give it your entire dataframe.

df     = pd.read_csv("/path/bank-full.csv")
df_new = pd.get_dummies(df, drop_first=True)

Output:

       age  balance  day  ...  poutcome_success  poutcome_unknown  Target_yes
0       58     2143    5  ...                 0                 1           0
1       44       29    5  ...                 0                 1           0
2       33        2    5  ...                 0                 1           0
3       47     1506    5  ...                 0                 1           0
4       33        1    5  ...                 0                 1           0
   ...      ...  ...  ...               ...               ...         ...
45206   51      825   17  ...                 0                 1           1
45207   71     1729   17  ...                 0                 1           1
45208   72     5715   17  ...                 1                 0           1
45209   57      668   17  ...                 0                 1           0
45210   37     2971   17  ...                 0                 0           0
Answered By: Stu Sztukowski
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.