How to unify all the "-" signs?

Question:

I have a simple program that takes data from the user. Here is an abbreviated version of it:

a = "0-1"
b = "0‑1"

print(a in b)  # prints False

Problem:

ord(‘-‘) for a = 45

ord(‘‑’) for b = 8209

How can I make sure that the "-" sign is always the same and checking a in b returns True?

Asked By: jatkso

||

Answers:

It’s not clear if your example is part of a more general, but for the example provided you can handle this using replace:

a = "0-1"
b = "0‑1"

print(a.replace("‑", "-") in b.replace("‑", "-"))  # True

I’ve called replace on both sides, because it’s not clear which side is your input and which is not. In principle though this comes down to "sanitize your input".

If this is more of a general problem, you might want to look at using .translate to produce a mapping of characters to apply in one go.

Answered By: PirateNinjas

The most robust way would be to use the unidecode module to convert all non-ASCII characters to their closest ASCII equivalent automatically.

import unidecode
print(unidecode.unidecode(a) in unidecode.unidecode(b))
Answered By: Mark Ransom
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.