remove prefix in all column names

Question:

I would like to remove the prefix from all column names in a dataframe.

I tried creating a udf and calling it in a for loop

def remove_prefix(str, prefix):
    if str.startswith(blabla):
        return str[len(prefix):]
    return str

for x in df.columns:
    x.remove_prefix()
Asked By: r_me

||

Answers:

You can use str.lstrip to strip the prefix from the column names, this way you avoid looping and checking which do contain the prefix:

# Example dataframe
df = pd.DataFrame(columns=['pre_A', 'pre_B', 'C'])
df.columns = df.columns.str.lstrip('pre_')

Resulting in:

print(df.columns)
# Index(['A', 'B', 'C'], dtype='object')

Note: This will also remove an occurence of pre_ preceded by another, i.e. all the left side successive occurrences.

Answered By: yatu

Use replace in list-comprehension:

df.columns = [i.replace(prefix,"") for i in df.columns]
Answered By: Sociopath

Your can read file without headers, using header=None:

pandas.read_csv(filepath_or_buffer=filename, header=None, sep=',')  
Answered By: Angelo Mendes

Use the rename method, which accepts a function to apply to column names


def remove_prefix(prefix):
    return lambda x: x[len(prefix):]

frame = pd.DataFrame(dict(x_a=[1,2,3], x_b=[4,5,6]))  
frame = frame.rename(remove_prefix('x_'), axis='columns')
Answered By: blue_note

Use Series.str.replace with regex ^ for match start of string:

df = pd.DataFrame(columns=['pre_A', 'pre_B', 'pre_predmet'])
df.columns = df.columns.str.replace('^pre_', '')
print (df)
Empty DataFrame
Columns: [A, B, predmet]
Index: []

Another solution is use list comprehension with re.sub:

import re

df.columns = [re.sub('^pre_',"", x) for x in df.columns]
Answered By: jezrael

Remove it using standard pandas API:

df.columns = df.columns.str.removeprefix("prefix_")
Answered By: Arn
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.