Rename columns based on certain pattern
Question:
I have the following columns in a dataframe: Id, category2, Brandyqdy1, Brandyqwdwdy2, Brandyqdw3
If the column’s name starts with Brand
and ends with 1
, I need it renamed as Vans
. Similarly, for other Brand columns, use the following:
rename_brands = {'1': 'Vans', '2': 'Nike', 3:'Adidas'}
Also, I will be renaming other columns apart from the ones that start with Brand
, overall:
rename_columns = {'Id': 'record', 'Category2': 'Sku', '1': 'Vans', '2': 'Nike', 3:'Adidas'}
Answers:
You can chain the two rename
method. For regex rename, you can use re.sub
import re
rename_brands = {'1': 'Vans', '2': 'Nike', 3:'Adidas'}
rename_columns = {'Id': 'record', 'Category2': 'Sku', '1': 'Vans', '2': 'Nike', '3':'Adidas'}
out = (df.rename(columns=rename_columns)
.rename(columns=lambda col: re.sub('^Brand.*(d)$',
lambda m: rename_brands.get(m.group(1), m.group(0)),
col)))
$ print(df)
Id Category2 Brandyqdy1 Brandyqwdwdy2 Brandyqdw3 1 2
0 NaN NaN NaN NaN NaN NaN NaN
$ print(out)
record Sku Vans Nike Brandyqdw3 Vans Nike
0 NaN NaN NaN NaN NaN NaN NaN
Solution
- Select the columns that do not contain ‘Brand’ from the dataframe as
df1
. Instead, include ‘Brand’ as df2
.
- Use a
for
loop to replace the columns ending with numbers in df2 corresponding to the brands
dictionary.
Join
the df1
and the df2
together.
Sample Code
import pandas as pd
df = pd.DataFrame({
'Id':['001', '002'],
'Category':['A', 'S'],
'Brandtxsu1':[1, 1],
'Brandxyw2':[2, 2]
})
print(df)
print('------------------------------------')
brands = {'1': 'Vans', '2': 'Nike'}
df1 = df[['Id', 'Category']].rename(columns={'Category': 'Record'})
df2 = df.loc[:, df.columns.str.startswith('Brand')]
for i in range(1,3):
df2 = df2.rename(columns={df2.loc[:, df2.columns.str.endswith(str(i))].columns.values[0]: brands[str(i)]})
df_output = df1.join(df2)
print(df_output)
Output
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 S 1 2
------------------------------------
Id Record Vans Nike
0 001 A 1 2
1 002 S 1 2
I have the following columns in a dataframe: Id, category2, Brandyqdy1, Brandyqwdwdy2, Brandyqdw3
If the column’s name starts with Brand
and ends with 1
, I need it renamed as Vans
. Similarly, for other Brand columns, use the following:
rename_brands = {'1': 'Vans', '2': 'Nike', 3:'Adidas'}
Also, I will be renaming other columns apart from the ones that start with Brand
, overall:
rename_columns = {'Id': 'record', 'Category2': 'Sku', '1': 'Vans', '2': 'Nike', 3:'Adidas'}
You can chain the two rename
method. For regex rename, you can use re.sub
import re
rename_brands = {'1': 'Vans', '2': 'Nike', 3:'Adidas'}
rename_columns = {'Id': 'record', 'Category2': 'Sku', '1': 'Vans', '2': 'Nike', '3':'Adidas'}
out = (df.rename(columns=rename_columns)
.rename(columns=lambda col: re.sub('^Brand.*(d)$',
lambda m: rename_brands.get(m.group(1), m.group(0)),
col)))
$ print(df)
Id Category2 Brandyqdy1 Brandyqwdwdy2 Brandyqdw3 1 2
0 NaN NaN NaN NaN NaN NaN NaN
$ print(out)
record Sku Vans Nike Brandyqdw3 Vans Nike
0 NaN NaN NaN NaN NaN NaN NaN
Solution
- Select the columns that do not contain ‘Brand’ from the dataframe as
df1
. Instead, include ‘Brand’ asdf2
. - Use a
for
loop to replace the columns ending with numbers in df2 corresponding to thebrands
dictionary. Join
thedf1
and thedf2
together.
Sample Code
import pandas as pd
df = pd.DataFrame({
'Id':['001', '002'],
'Category':['A', 'S'],
'Brandtxsu1':[1, 1],
'Brandxyw2':[2, 2]
})
print(df)
print('------------------------------------')
brands = {'1': 'Vans', '2': 'Nike'}
df1 = df[['Id', 'Category']].rename(columns={'Category': 'Record'})
df2 = df.loc[:, df.columns.str.startswith('Brand')]
for i in range(1,3):
df2 = df2.rename(columns={df2.loc[:, df2.columns.str.endswith(str(i))].columns.values[0]: brands[str(i)]})
df_output = df1.join(df2)
print(df_output)
Output
Id Category Brandtxsu1 Brandxyw2
0 001 A 1 2
1 002 S 1 2
------------------------------------
Id Record Vans Nike
0 001 A 1 2
1 002 S 1 2