Replace special characters in pandas dataframe from a string of special characters

Question:

I have created a pandas dataframe called df using this code:

import numpy as np
import pandas as pd

ds = {'col1' : ['1','3/','4'], 'col2':['A','!B','@C']}

df =pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

  col1 col2
0    1    A
1   3/   !B
2    4   @C

The columns contain some special characters (/ and @) that I need to replace with a blank space.

Now, I have a list of special characters:

listOfSpecialChars = ‘¬`!"£$£#/,.+*><@|"’

How can I replace any of the special characters listed in listOfSpecialChars with a blank space, any time I encounter them at any point in a dataframe, for any columns?
At the moment I am dealing with a 100K-record dataframe with 560 columns, so I can’t write a piece of code for each variable.

Asked By: Giampaolo Levorato

||

Answers:

You can use apply with str.replace:

import re
chars = ''.join(map(re.escape, listOfSpecialChars))

df2 = df.apply(lambda c: c.str.replace(f'[{chars}]', '', regex=True))

Alternatively, stack/unstack:

df2 = df.stack().str.replace(f'[{chars}]', '', regex=True).unstack()

output:

  col1 col2
0    1    A
1    3    B
2    4    C
Answered By: mozway
## Removes everything except letters, numbers, dash, and underscore. 
df['data'] = df['data'].str.replace(r'[^A-Za-z0-9-_]+', '')
Answered By: Yas