Strange pandas behaviour. character is found where it does not exist

Question:

I aim to write a function to apply to an entire dataframe. Each column is checked to see if it contains the currency symbol ‘$’ and remove it.

Surprisingly, a case like:

import pandas as pd
dates = pd.date_range(start='2021-01-01', end='2021-01-10').strftime('%d-%m-%Y')
print(dates)

output:

Index(['01-01-2021', '02-01-2021', '03-01-2021', '04-01-2021', '05-01-2021', '06-01-2021', '07-01-2021', '08-01-2021', '09-01-2021', '10-01-2021'], dtype='object')

But when I do:

dates.str.contains('$').all()

It returns True. Why???

Asked By: trazoM

||

Answers:

.contains uses regex (by default), not just a raw string. And $ means the end of the line in regex (intuitively or not, all strings have "the end"). To check the symbol "$" you need to escape it:

dates.str.contains('$').all()

Or you can use regex=False argument of the .contains():

dates.str.contains('$', regex=False).all()

Both options return False.

Answered By: Yevhen Kuzmovych
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.