Rename columns based on certain pattern

Question:

I have the following columns in a dataframe: Id, category2, Brandyqdy1, Brandyqwdwdy2, Brandyqdw3

If the column’s name starts with Brand and ends with 1, I need it renamed as Vans. Similarly, for other Brand columns, use the following:
rename_brands = {'1': 'Vans', '2': 'Nike', 3:'Adidas'}

Also, I will be renaming other columns apart from the ones that start with Brand, overall:
rename_columns = {'Id': 'record', 'Category2': 'Sku', '1': 'Vans', '2': 'Nike', 3:'Adidas'}

Asked By: M J

||

Answers:

You can chain the two rename method. For regex rename, you can use re.sub

import re


rename_brands = {'1': 'Vans', '2': 'Nike', 3:'Adidas'}
rename_columns = {'Id': 'record', 'Category2': 'Sku', '1': 'Vans', '2': 'Nike', '3':'Adidas'}

out = (df.rename(columns=rename_columns)
       .rename(columns=lambda col: re.sub('^Brand.*(d)$',
                                          lambda m: rename_brands.get(m.group(1), m.group(0)),
                                          col)))
$ print(df)

   Id  Category2  Brandyqdy1  Brandyqwdwdy2  Brandyqdw3   1   2
0 NaN        NaN         NaN            NaN         NaN NaN NaN


$ print(out)

   record  Sku  Vans  Nike  Brandyqdw3  Vans  Nike
0     NaN  NaN   NaN   NaN         NaN   NaN   NaN
Answered By: Ynjxsjmh

Solution

  1. Select the columns that do not contain ‘Brand’ from the dataframe as df1. Instead, include ‘Brand’ as df2.
  2. Use a for loop to replace the columns ending with numbers in df2 corresponding to the brands dictionary.
  3. Join the df1 and the df2 together.

Sample Code

import pandas as pd

df = pd.DataFrame({
    'Id':['001', '002'], 
    'Category':['A', 'S'],
    'Brandtxsu1':[1, 1],
    'Brandxyw2':[2, 2]
})

print(df)

print('------------------------------------')

brands = {'1': 'Vans', '2': 'Nike'}

df1 = df[['Id', 'Category']].rename(columns={'Category': 'Record'})

df2 = df.loc[:, df.columns.str.startswith('Brand')]

for i in range(1,3):
    df2 = df2.rename(columns={df2.loc[:, df2.columns.str.endswith(str(i))].columns.values[0]: brands[str(i)]})

df_output = df1.join(df2)

print(df_output)

Output

    Id Category  Brandtxsu1  Brandxyw2
0  001        A           1          2
1  002        S           1          2
------------------------------------
    Id Record  Vans  Nike
0  001      A     1     2
1  002      S     1     2
Answered By: Brian.Z
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.