Pandas rename columns according to my dict

Question

I have many dataframes in pickle format, they have the same information, but columns name are not identical in all of them.
For example:

>     file name:    Columns_['Weryfikacja UK - 2022-04-14.xlsx_20uk-woocommerce-last-5yrs-expo.pkl'].pkl
>     columns_nam:  Index(['domain', 'phones', 'phones_data_source', 'company_name', 'SourceFile'], dtype='object')
> 
>     file name:    Columns_['US _ Canada - rechurn - 01.2022-04.2022.xlsx_Other.pkl'].pkl
>     columns_nam:  Index(['Phone', 'Domain', 'SourceFile'], dtype='object')
> 
>     file name:    Columns_['2022-08 - US _ Canada.xlsx_29.08-02.09 WixStore USCA.pkl'].pkl
>     columns_nam:  Index(['Phone', 'Alternative phone 1', 'Alternative phone2', 'Alternative phone3', 'Alternative phone4', 'SourceFile'], dtype='object')

I have a dict like this to rename columns names in all files:

my_dict = {
"Domain": ['Domain','domain', 'WWW', 'www'],
"Phone": ['Phone','phone_number', 'phones', 'Tel'],
"AlternativePhone1": ['Alternative phone1','Alternative phone 2', 'phones2'],
"AlternativePhone2": ['Alternative phone2','Alternative phone 3', 'phones3'],
"AlternativePhone3": ['Alternative phone3','Alternative phone 4', 'phones4'],
"AlternativePhone4": ['Alternative phone4','Alternative phone 5', 'phones5'],
"SourceFile": ['SourceFile']
}

I need a help with code, how should I do it?

for file in glob.glob("*.pkl"):
    df = pd.read_pickle(file)

On output I would like to have something like this

>     file name:    Columns_['2022-08 - US _ Canada.xlsx_29.08-02.09 WixStore USCA.pkl'].pkl
>     columns_nam:  Index(['Phone', 'AlternativePhone1', 'AlternativePhone2', 'AlternativePhone3', 'AlternativePhone4', 'SourceFile'], dtype='object')

Asked By: Bartosz

||

Source

Answer 1

The easiest way

I don’t think there is a standard way to rename columns based on a list of possible names built-in to pandas, but its easy enough with a few for loops:

my_dict = {
    "Domain": ['Domain','domain', 'WWW', 'www'],
    "Phone": ['Phone','phone_number', 'phones', 'Tel'],
}

df = pd.DataFrame({
    "WWW": [1,2,3,4,5], 
    "phones": [1,2,3,4,5]
})

for (goal, names) in my_dict.items():
    for name in names:
        if name in df.columns:
            df.rename(columns={name:goal}, inplace=True)

A cleaner way

The pandas.rename function does not actually care if the dataframe contains all the columns you provide new names for in the columns dictonary. This means that if we convert my_dict from newname -> List[oldname] to the oldname -> newname fromat expected by columns we can rename the columns without any for loops.

columns_dict = dict(sum([[(old, new) for old in olds] for (new, olds) in my_dict.items()], []))

# columns_dict = {
#     'Domain': 'Domain', 'domain': 'Domain', 'WWW': 'Domain', 'www': 'Domain', 
#     'Phone': 'Phone', 'phone_number': 'Phone', 'phones': 'Phone', 'Tel': 'Phone'
# }

df.rename(columns=columns_dict, inplace=True)

Answered By: Matt

Pandas rename columns according to my dict

Question:

Answers:

The easiest way

A cleaner way