Pandas: Apply transformations to all character columns

Question:

I am working in Python to try and apply a few transformations to all character/string columns in a pandas dataframe. The transformations are:

  • Make everything uppercase
  • Trim the white space

I come from an R background and this can be achieved via something like


mydf <- mydf %>% 
  dplyr::mutate_if(is.character, toupper)
  dplyr::mutate_if(is.character, trimws)

For Python I am at a loss. I have tried the below where it first identifies all the character columns and then attempts to trim the whitespace and make all the character columns upper case (Species in this case)

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Create a sample dataset
iris = load_iris()

df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= iris['feature_names'] + ['target'])

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Make character columns upper case and then trim the white space
string_dtypes = df.convert_dtypes().select_dtypes("string")
df[string_dtypes.columns] = string_dtypes.apply(lambda x: x.str.upper())
df[string_dtypes.columns] = string_dtypes.apply(lambda x: x.str.strip())

df

I appreciate this might be a very basic question and thank you in advance to anyone who takes the time to help

Asked By: John Smith

||

Answers:

You should be able to do this in one line with method chaining:

df.astype(str).apply(lambda x: x.str.upper().str.strip())

Output:

    sepal length (cm)   sepal width (cm)    petal length (cm)   petal width (cm)    target  species
0   5.1 3.5 1.4 0.2 0.0 SETOSA
1   4.9 3.0 1.4 0.2 0.0 SETOSA
2   4.7 3.2 1.3 0.2 0.0 SETOSA
3   4.6 3.1 1.5 0.2 0.0 SETOSA
4   5.0 3.6 1.4 0.2 0.0 SETOSA
... ... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2.0 VIRGINICA
146 6.3 2.5 5.0 1.9 2.0 VIRGINICA
147 6.5 3.0 5.2 2.0 2.0 VIRGINICA
148 6.2 3.4 5.4 2.3 2.0 VIRGINICA
149 5.9 3.0 5.1 1.8 2.0 VIRGINICA
Answered By: SamR
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.