How to remove symbols from a string with Python?
Question:
I’m a beginner with both Python and RegEx, and I would like to know how to make a string that takes symbols and replaces them with spaces. Any help is great.
For example:
how much for the maple syrup? $20.99? That's ricidulous!!!
into:
how much for the maple syrup 20 99 That s ridiculous
Answers:
One way, using regular expressions:
>>> s = "how much for the maple syrup? $20.99? That's ridiculous!!!"
>>> re.sub(r'[^w]', ' ', s)
'how much for the maple syrup 20 99 That s ridiculous '
-
w
will match alphanumeric characters and underscores
-
[^w]
will match anything that’s not alphanumeric or underscore
I often just open the console and look for the solution in the objects methods. Quite often it’s already there:
>>> a = "hello ' s"
>>> dir(a)
[ (....) 'partition', 'replace' (....)]
>>> a.replace("'", " ")
'hello s'
Short answer: Use string.replace()
.
Sometimes it takes longer to figure out the regex than to just write it out in python:
import string
s = "how much for the maple syrup? $20.99? That's ricidulous!!!"
for char in string.punctuation:
s = s.replace(char, ' ')
If you need other characters you can change it to use a white-list or extend your black-list.
Sample white-list:
whitelist = string.letters + string.digits + ' '
new_s = ''
for char in s:
if char in whitelist:
new_s += char
else:
new_s += ' '
Sample white-list using a generator-expression:
whitelist = string.letters + string.digits + ' '
new_s = ''.join(c for c in s if c in whitelist)
I’m a beginner with both Python and RegEx, and I would like to know how to make a string that takes symbols and replaces them with spaces. Any help is great.
For example:
how much for the maple syrup? $20.99? That's ricidulous!!!
into:
how much for the maple syrup 20 99 That s ridiculous
One way, using regular expressions:
>>> s = "how much for the maple syrup? $20.99? That's ridiculous!!!"
>>> re.sub(r'[^w]', ' ', s)
'how much for the maple syrup 20 99 That s ridiculous '
-
w
will match alphanumeric characters and underscores -
[^w]
will match anything that’s not alphanumeric or underscore
I often just open the console and look for the solution in the objects methods. Quite often it’s already there:
>>> a = "hello ' s"
>>> dir(a)
[ (....) 'partition', 'replace' (....)]
>>> a.replace("'", " ")
'hello s'
Short answer: Use string.replace()
.
Sometimes it takes longer to figure out the regex than to just write it out in python:
import string
s = "how much for the maple syrup? $20.99? That's ricidulous!!!"
for char in string.punctuation:
s = s.replace(char, ' ')
If you need other characters you can change it to use a white-list or extend your black-list.
Sample white-list:
whitelist = string.letters + string.digits + ' '
new_s = ''
for char in s:
if char in whitelist:
new_s += char
else:
new_s += ' '
Sample white-list using a generator-expression:
whitelist = string.letters + string.digits + ' '
new_s = ''.join(c for c in s if c in whitelist)