pandas dataframe replace multiple substring of column
Question:
I have below the code
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
pattern = ','.join(['$', ','])
df['A'] = df['A'].str.replace('$|,', '', regex=True)
print(df['A'])
What I am trying to remove every occurrence of ‘$’ or ‘,’… so I am trying to replace with blank..
But its replacing only ,
Output I am getting
0 $5756
1 3434
2 $45
3 1344$
it should be
0 5756
1 3434
2 45
3 1344
What I am doing wrong
Any help appreciated
Thanks
Answers:
Use:
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('[$,]', '', regex=True)
print(df)
Output
A
0 5756
1 3434
2 45
3 1344
The problem is that the character $
has a special meaning in regular expressions. From the documentation (emphasis mine):
$
Matches the end of the string or just before the newline at the end
of the string, and in MULTILINE mode also matches before a newline.
foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$
matches only ‘foo’. More interestingly, searching for foo.$ in
‘foo1nfoo2n’ matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
searching for a single $ in ‘foon’ will find two (empty) matches: one
just before the newline, and one at the end of the string.mode;
searching for a single $ in ‘foon’ will find two (empty) matches: one
just before the newline, and one at the end of the string.
So you need to escape the character or put it inside a character class.
As an alternative use:
df['A'].str.replace('$|,', '', regex=True) # note the escaping
If you only have integer-like numbers an easy option is to remove all but digits D
, then you don’t have to deal with other special regex characters like $
:
df['A'] = df['A'].str.replace(r'D', '', regex=True)
output:
A
0 5756
1 3434
2 45
3 1344
It might be useful for you:
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('$', '', regex=True)
print(df['A'])
I have below the code
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
pattern = ','.join(['$', ','])
df['A'] = df['A'].str.replace('$|,', '', regex=True)
print(df['A'])
What I am trying to remove every occurrence of ‘$’ or ‘,’… so I am trying to replace with blank..
But its replacing only ,
Output I am getting
0 $5756
1 3434
2 $45
3 1344$
it should be
0 5756
1 3434
2 45
3 1344
What I am doing wrong
Any help appreciated
Thanks
Use:
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('[$,]', '', regex=True)
print(df)
Output
A
0 5756
1 3434
2 45
3 1344
The problem is that the character $
has a special meaning in regular expressions. From the documentation (emphasis mine):
$
Matches the end of the string or just before the newline at the end
of the string, and in MULTILINE mode also matches before a newline.
foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$
matches only ‘foo’. More interestingly, searching for foo.$ in
‘foo1nfoo2n’ matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode;
searching for a single $ in ‘foon’ will find two (empty) matches: one
just before the newline, and one at the end of the string.mode;
searching for a single $ in ‘foon’ will find two (empty) matches: one
just before the newline, and one at the end of the string.
So you need to escape the character or put it inside a character class.
As an alternative use:
df['A'].str.replace('$|,', '', regex=True) # note the escaping
If you only have integer-like numbers an easy option is to remove all but digits D
, then you don’t have to deal with other special regex characters like $
:
df['A'] = df['A'].str.replace(r'D', '', regex=True)
output:
A
0 5756
1 3434
2 45
3 1344
It might be useful for you:
import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('$', '', regex=True)
print(df['A'])