Removing Suffix From Dataframe Column Names

Question:

I am trying to remove a suffix from all columns in a dataframe, however I am getting error messages. Any suggestions would be appreciated.

df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df.add_suffix('_x')

def strip_right(df.columns, _x):
    if not text.endswith("_x"):
        return text
    # else
    return text[:len(df.columns)-len("_x")]

Error:

def strip_right(tmp, "_x"):
                            ^
SyntaxError: invalid syntax

I’ve also tried removing the quotations.

def strip_right(df.columns, _x):
    if not text.endswith(_x):
        return text
    # else
    return text[:len(df.columns)-len(_x)]

Error:

def strip_right(df.columns, _x):
                      ^
SyntaxError: invalid syntax
Asked By: Starbucks

||

Answers:

Here is a more concrete example:.

import pandas as pd
import numpy as np
    
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list("ABCD"))
df = df.add_suffix("_x")
    
print("With Suffix")
print(df.head())
    
def strip_right(df, suffix="_x"):
    df.columns = df.columns.str.rstrip(suffix)
  
strip_right(df) 
    
print("nnWithout Suffix")
print(df.head())

Output:

With Suffix
   A_x  B_x  C_x  D_x
0    0    7    0    2
1    5    1    8    5
2    6    2    0    1
3    6    6    5    6
4    8    6    5    8
    
    
Without Suffix
   A  B  C  D
0  0  7  0  2
1  5  1  8  5
2  6  2  0  1
3  6  6  5  6
4  8  6  5  8
Answered By: faisal

I found a bug in the implementation of the accepted answer. The docs for pandas.Series.str.rstrip() reference str.rstrip(), which states:

"The chars argument is not a suffix; rather, all combinations of its values are stripped."

Instead I had to use pandas.Series.str.replace to remove the actual suffix from my column names. See the modified example below.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df = df.add_suffix('_x')
df['Ex_'] = np.random.randint(0,10,size=(10, 1))

df1 = pd.DataFrame(df, copy=True)
print ("With Suffix")
print(df1.head())

def strip_right(df, suffix='_x'):
    df.columns = df.columns.str.rstrip(suffix)

strip_right(df1) 

print ("nnAfter .rstrip()")
print(df1.head())

def replace_right(df, suffix='_x'):
    df.columns = df.columns.str.replace(suffix+'$', '', regex=True)

print ("nnWith Suffix")
print(df.head())

replace_right(df)

print ("nnAfter .replace()")
print(df.head())

Output:

With Suffix
   A_x  B_x  C_x  D_x  Ex_
0    4    9    2    3    4
1    1    6    5    8    6
2    2    5    2    3    6
3    1    4    7    6    4
4    3    9    3    5    8


After .rstrip()
   A  B  C  D  E
0  4  9  2  3  4
1  1  6  5  8  6
2  2  5  2  3  6
3  1  4  7  6  4
4  3  9  3  5  8


After .replace()
   A  B  C  D  Ex_
0  4  9  2  3    4
1  1  6  5  8    6
2  2  5  2  3    6
3  1  4  7  6    4
4  3  9  3  5    8
Answered By: rcodemonkey

As noted by @rcodemonkey, the accepted answer is not correct. In some cases, it will remove other characters at the end of a string because to_strip argument of str.rstrip method is in fact a set of characters, not a proper suffix.

Therefore, you should either use the other proposed solution (str.replace method) or I would suggest using str.removesuffix method (available from python 3.9 and pandas 1.4) that is the clearest and most understood from these solutions:

def remove_suffix(df, suffix='_x'):
    df.columns = df.columns.str.removesuffix(suffix)

Full example (taking the example from the accepted answer):

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df = df.add_suffix('_x')

# With Suffix
#    A_x  B_x  C_x  D_x
# 0    5    4    7    4
# 1    9    6    4    6
# 2    1    3    2    9
# 3    3    3    1    3
# 4    0    0    6    2

remove_suffix(df) 

# Without Suffix
#    A  B  C  D
# 0  5  4  7  4
# 1  9  6  4  6
# 2  1  3  2  9
# 3  3  3  1  3
# 4  0  0  6  2
Answered By: Nerxis
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.