Does Python forbid two similarly looking Unicode identifiers?

Question:

I was playing around with Unicode identifiers and stumbled upon this:

>>>  , x = 1, 2
>>>  , x
(1, 2)
>>>  , f = 1, 2
>>>  , f
(2, 2)

What’s going on here? Why does Python replace the object referenced by , but only sometimes? Where is that behavior described?

Asked By: Erik Cederstrand

||

Answers:

PEP 3131 — Supporting Non-ASCII Identifiers says

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

You can use unicodedata to test the conversions:

import unicodedata

unicodedata.normalize('NFKC', ' ')
# f

which would indicate that ' ' gets converted to 'f' in parsing. Leading to the expected:

   = "Some String"
print(f)
# "Some String"
Answered By: Mark

Here’s a small example, just to show how horrible this “feature” is:

 ᵢ _f ᵣₑ_ ₕ _dₑ ᵢ i ℓy_ _ _ ᵘg = 42
print(T ℹ _ e ᵣe_ₛ º _ e ᵢ ⁱtᵉ _ ℯ_ _ )
# => 42

Try it online! (But please don’t use it)

And as mentioned by @MarkMeyer, two identifiers might be distinct even though they look just the same (“CYRILLIC CAPITAL LETTER A” and “LATIN CAPITAL LETTER A”)

А = 42
print(A)
# => NameError: name 'A' is not defined
Answered By: Eric Duminil
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.