Rename pandas column iteratively

Question:

I have several columns named the same in a data frame. How can I rename the below normal and KIRC to normal_1, normal_2, KIRC_1, KIRC_2?

import pandas as pd

gene_exp.columns = gene_exp.iloc[-1]
gene_exp = gene_exp.iloc[:-1]
gene_exp

# Append "_[number]" 
c = pd.Series(gene_exp.columns)
for dup in gene_exp.columns[gene_exp.columns.duplicated(keep=False)]: 
    c[df.columns.get_loc(dup)] = ([dup + '_' + str(d_idx) 
                                     if d_idx != 0 
                                     else dup 
                                     for d_idx in range(gene_exp.columns.get_loc(dup).sum())]
                                    )
gene_exp

Traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'KIRC'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_27/3403075751.py in <module>
      5                                      if d_idx != 0
      6                                      else dup
----> 7                                      for d_idx in range(gene_exp.columns.get_loc(dup).sum())]
      8                                     )
      9 gene_exp

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'KIRC'

Sample data

Gene NAME KIRC normal normal KIRC
0 ABC DEF GHI JKL MNO PQR
1 STU VWX YZ ABC DEF GHI

Desired output:

Gene NAME KIRC_1 normal_1 normal_2 KIRC_2
0 ABC DEF GHI JKL MNO PQR
1 STU VWX YZ ABC DEF GHI
Asked By: melolilili

||

Answers:

# set Gene and Name as Index, as we don't need these renamed
df.set_index(['Gene','NAME'], inplace=True)

# create a dataframe from the columns
df2=pd.DataFrame(df.columns.values, columns=['col'])

# create new columns by counting repeated names and adding 1 to count
# assign columns to the dataframe
df.columns=df2['col']+ '_' +(df2.groupby('col').cumcount()+1).astype(str)

# reset index
out=df.reset_index()

OR

# we just need to manipulate the columns by reindexing
# so creating a temporary DF helps with the performance 

# create a temp DF
df_cols=df.head()


# SAME AS ABOVE SOLUTION, EXCEPT DF REPLACE WITH DF_COLUMNS
df_cols.set_index(['Gene','NAME'], inplace=True)

# create a dataframe from the columns
df2=pd.DataFrame(df_cols.columns.values, columns=['col'])
# create new columns by counting repeated names and adding 1 to count
# assign columns to the dataframe
df_cols.columns=df2['col']+ '_' +(df2.groupby('col').cumcount()+1).astype(str)
df_cols.reset_index(inplace=True)


# Update the columns in DF
df.columns = df_cols.columns
df
   Gene     NAME    KIRC_1  normal_1    normal_2    KIRC_2
0   ABC     DEF     GHI          JKL         MNO       PQR
1   STU     VWX     YZ           ABC        DEF        GHI
Answered By: Naveed

Can’t see your starting dataset, but this should do what you want – you don’t look like you’re assigning the columns back to the dataframe in your code, and you’re not assigning the incrementer to dup if it is 0

data = {"Gene": "ABC", "NAME": "DEF", "KIRC": "GHI", "normal": "MNO"}

df = pd.DataFrame.from_records([data])
df = pd.concat([df, df[["KIRC", "normal"]]], axis=1)
cols = pd.Series(df.columns)
for dup in df.columns[df.columns.duplicated(keep=False)]:
    cols[df.columns.get_loc(dup)] = ([dup + '_' + str(d_idx+1)
                                     for d_idx in range(df.columns.get_loc(dup).sum())]
                                    )
df.columns = cols
print(df)
Answered By: Allan Elder
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.