Does Python forbid two similarly looking Unicode identifiers?

Question

I was playing around with Unicode identifiers and stumbled upon this:

>>>  , x = 1, 2
>>>  , x
(1, 2)
>>>  , f = 1, 2
>>>  , f
(2, 2)

What’s going on here? Why does Python replace the object referenced by , but only sometimes? Where is that behavior described?

Asked By: Erik Cederstrand

||

Source

Answer 1

PEP 3131 — Supporting Non-ASCII Identifiers says

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

You can use unicodedata to test the conversions:

import unicodedata

unicodedata.normalize('NFKC', ' ')
# f

which would indicate that ' ' gets converted to 'f' in parsing. Leading to the expected:

   = "Some String"
print(f)
# "Some String"

Answered By: Mark

Answer 2

Here’s a small example, just to show how horrible this “feature” is:

 ᵢ _ｆ ᵣₑ_ ₕ _dₑ ᵢ ｉ ℓy_ _ _ ᵘg = 42
print(T ℹ _ e ᵣe_ₛ º _ e ᵢ ⁱｔᵉ _ ℯ_ _ )
# => 42

Try it online! (But please don’t use it)

And as mentioned by @MarkMeyer, two identifiers might be distinct even though they look just the same (“CYRILLIC CAPITAL LETTER A” and “LATIN CAPITAL LETTER A”)

А = 42
print(A)
# => NameError: name 'A' is not defined

Answered By: Eric Duminil

Does Python forbid two similarly looking Unicode identifiers?

Question:

Answers: