select columns based on columns names

Question:

I need to get columns that match the list of names provided. The probles is, the csv file columns might contain extra spaces, and since the csv file is dynamically obtained from api, its not make sense to fix the csv manually.

 fields = ['Assembled Weights', 'Assembled Heights']

 dataframe = pd.read_excel(file)
 df = dataframe[fields]

The problem in csv is some of its column names contains extra spaces for example

     Assembled Weights -> single space
     Assembled  Heights -> contains two spaces

So I got error KeyError: "['Assembled Heights'] not in index"

How do you ignore the extra spaces from dataframe column names?

Asked By: rai

||

Answers:

Remove the extra spaces with str.replace:

out = df.loc[:, df.columns.str.replace(r's+', ' ', regex=True).isin(fields)]

NB. This leaves the original names unchanged.

If you also want to fix the names:

df.columns = df.columns.str.replace(r's+', ' ', regex=True)
out = df.loc[:, df.columns.isin(fields)]

# or
out = df[fields]
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.