Strange pandas behaviour. character is found where it does not exist
Question:
I aim to write a function to apply to an entire dataframe. Each column is checked to see if it contains the currency symbol ‘$’ and remove it.
Surprisingly, a case like:
import pandas as pd
dates = pd.date_range(start='2021-01-01', end='2021-01-10').strftime('%d-%m-%Y')
print(dates)
output:
Index(['01-01-2021', '02-01-2021', '03-01-2021', '04-01-2021', '05-01-2021', '06-01-2021', '07-01-2021', '08-01-2021', '09-01-2021', '10-01-2021'], dtype='object')
But when I do:
dates.str.contains('$').all()
It returns True
. Why???
Answers:
.contains
uses regex (by default), not just a raw string. And $
means the end of the line in regex (intuitively or not, all strings have "the end"). To check the symbol "$"
you need to escape it:
dates.str.contains('$').all()
Or you can use regex=False
argument of the .contains()
:
dates.str.contains('$', regex=False).all()
Both options return False
.
I aim to write a function to apply to an entire dataframe. Each column is checked to see if it contains the currency symbol ‘$’ and remove it.
Surprisingly, a case like:
import pandas as pd
dates = pd.date_range(start='2021-01-01', end='2021-01-10').strftime('%d-%m-%Y')
print(dates)
output:
Index(['01-01-2021', '02-01-2021', '03-01-2021', '04-01-2021', '05-01-2021', '06-01-2021', '07-01-2021', '08-01-2021', '09-01-2021', '10-01-2021'], dtype='object')
But when I do:
dates.str.contains('$').all()
It returns True
. Why???
.contains
uses regex (by default), not just a raw string. And $
means the end of the line in regex (intuitively or not, all strings have "the end"). To check the symbol "$"
you need to escape it:
dates.str.contains('$').all()
Or you can use regex=False
argument of the .contains()
:
dates.str.contains('$', regex=False).all()
Both options return False
.