remove prefix in all column names
Question:
I would like to remove the prefix from all column names in a dataframe.
I tried creating a udf and calling it in a for loop
def remove_prefix(str, prefix):
if str.startswith(blabla):
return str[len(prefix):]
return str
for x in df.columns:
x.remove_prefix()
Answers:
You can use str.lstrip
to strip the prefix from the column names, this way you avoid looping and checking which do contain the prefix:
# Example dataframe
df = pd.DataFrame(columns=['pre_A', 'pre_B', 'C'])
df.columns = df.columns.str.lstrip('pre_')
Resulting in:
print(df.columns)
# Index(['A', 'B', 'C'], dtype='object')
Note: This will also remove an occurence of pre_
preceded by another, i.e. all the left side successive occurrences.
Use replace
in list-comprehension
:
df.columns = [i.replace(prefix,"") for i in df.columns]
Your can read file without headers, using header=None
:
pandas.read_csv(filepath_or_buffer=filename, header=None, sep=',')
Use the rename
method, which accepts a function to apply to column names
def remove_prefix(prefix):
return lambda x: x[len(prefix):]
frame = pd.DataFrame(dict(x_a=[1,2,3], x_b=[4,5,6]))
frame = frame.rename(remove_prefix('x_'), axis='columns')
Use Series.str.replace
with regex ^
for match start of string:
df = pd.DataFrame(columns=['pre_A', 'pre_B', 'pre_predmet'])
df.columns = df.columns.str.replace('^pre_', '')
print (df)
Empty DataFrame
Columns: [A, B, predmet]
Index: []
Another solution is use list comprehension with re.sub
:
import re
df.columns = [re.sub('^pre_',"", x) for x in df.columns]
Remove it using standard pandas API:
df.columns = df.columns.str.removeprefix("prefix_")
I would like to remove the prefix from all column names in a dataframe.
I tried creating a udf and calling it in a for loop
def remove_prefix(str, prefix):
if str.startswith(blabla):
return str[len(prefix):]
return str
for x in df.columns:
x.remove_prefix()
You can use str.lstrip
to strip the prefix from the column names, this way you avoid looping and checking which do contain the prefix:
# Example dataframe
df = pd.DataFrame(columns=['pre_A', 'pre_B', 'C'])
df.columns = df.columns.str.lstrip('pre_')
Resulting in:
print(df.columns)
# Index(['A', 'B', 'C'], dtype='object')
Note: This will also remove an occurence of pre_
preceded by another, i.e. all the left side successive occurrences.
Use replace
in list-comprehension
:
df.columns = [i.replace(prefix,"") for i in df.columns]
Your can read file without headers, using header=None
:
pandas.read_csv(filepath_or_buffer=filename, header=None, sep=',')
Use the rename
method, which accepts a function to apply to column names
def remove_prefix(prefix):
return lambda x: x[len(prefix):]
frame = pd.DataFrame(dict(x_a=[1,2,3], x_b=[4,5,6]))
frame = frame.rename(remove_prefix('x_'), axis='columns')
Use Series.str.replace
with regex ^
for match start of string:
df = pd.DataFrame(columns=['pre_A', 'pre_B', 'pre_predmet'])
df.columns = df.columns.str.replace('^pre_', '')
print (df)
Empty DataFrame
Columns: [A, B, predmet]
Index: []
Another solution is use list comprehension with re.sub
:
import re
df.columns = [re.sub('^pre_',"", x) for x in df.columns]
Remove it using standard pandas API:
df.columns = df.columns.str.removeprefix("prefix_")