Unicode identifiers in Python?

Question:

I want to build a Python function that calculates,

alt text

and would like to name my summation function Σ. In a similar fashion, would like to use Π for product, and so on. I was wondering if there was a way to name a python function in this fashion?

def Σ (..):
 ..
 ..

That is, does Python support unicode identifiers, and if so, could someone provide an example for it?

Thanks!


Original motivation for this was a piece of Clojure code I saw today that looks like,

(defn entropy [X]
      (* -1 (Σ [i X] (* (p i) (log (p i))))))

where Σ is a macro defined as,

(defmacro Σ
    ... )

and I thought that was pretty cool.


BTW, to address a couple of comments about readability – with a lot of stats/ML code for instance, being able to compose operations with symbols would be really helpful. (Especially for really complex integrals et al)

φ(z) = ∫(N(x|0,1,1), -∞, z)

vs

Phi(z) = integral(N(x|0,1,1), -inf, z)

or even just the lambda character for lambda()!

Asked By: viksit

||

Answers:

(I think it’s pretty cool too, that might mean we’re geeks.)

You’re fine to do this with the code you have above in Python 3. (It works in my Python 3.1 interpreter at least.) See:

But in Python 2, identifiers can only be ASCII letters, numbers and underscores.

Answered By: Paul D. Waite

Python 2.x does not support unicode identifiers, and consequently does not support Σ as an identifier. Python 3.x does support unicode identifiers, although many people will get cross if they have to edit source files with, for example, identifiers A and Α (latin A and greek capital alpha.) Sigma is often readable enough, but still, not as readable as the word sigma, so why bother?

Answered By: Thomas Wouters

According to is it bad, you can use some unicode characters, but not all: You are restricted to characters identified as letters.

>>> α = 3  
>>> Σ = sum   
>>> import math  
>>> √ = math.sqrt  
  File "<stdin>", line 1  
    √ = 3  
      ^  
SyntaxError: invalid character in identifier

Besides: I think it is very cool to be able to use unicode as identifiers – and I wish, i could use all.

I use the neo keyboard layout, which gives me greek and math symbols on extra layers:

αβχδεφγψιθκλνοπϕστ[&ωξυζ
∀⇐ℂΔ∃ΦΓΨ∫Λ⇔Σ∈ℚℝ∂⊂√∩Ξ

It’s worth pointing out that Python 3 does support Unicode identifiers, but only allows letter or number like symbols (see http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers for full details). That’s why Σ works (remember that it’s a Greek letter, not just a math symbol), but √ doesn’t.

For anyone interested, I made a website that lists every Unicode character that is valid in a Python variable https://www.asmeurer.com/python-unicode-variable-names/ (be warned that there are quite a lot of them, over 100000 in fact)

Answered By: asmeurer

(this answer is meant to be a minor addendum not a complete answer)

The additional gotcha to unicode identifiers (which @mike-desimone mentions and I discovered quickly when I thought this was a cool thread and switched to a terminal to play with it), is the multiple versions of each glyph are not equivalent, with regards to how you get to each glyph on each platform. For example Σ (aka greek capital letter sigma, aka U+03A3, [can’t find a direct mac input method]) is fine, but unfortunately ∑ (aka N-ary Summation, aka U+2211, aka opt/alt-w using Mac OS X) is not a valid identifier.

>>> Σ = 20
>>> Σ
20

but

>>> ∑ = 20
File "<input>", line 1
  ∑ = 20
  ^
SyntaxError: invalid character in identifier

Using Σ specifically (and probably unicode chars in general) as an identifier might generate some very hard to diagnose errors if you have multiple developers on multiple platforms contributing to your code, for example, debug this visually:

∑ looks very similar to Σ, depending on the typeface selected

The two glyphs are easier to differentiate on this page, but depending on the font used, this may not be the case.

Even the traceback isn’t much clearer unless Σ is printed near the ∑

  File "~/Dev/play_python33/identifiers.py", line 12
    print(∑([2, 2, 2, 2, 2]))
            ^
SyntaxError: invalid character in identifier
Answered By: Peter Hanley
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.