How can I check if an identifier is dunder or class-private (i.e. will be mangled)?

Question:

I’m writing a project that gives advice about variable names, and I want it to tell if a name matches any of the reserved classes of identifiers. The first one ("private") is pretty straightforward, just name.startswith('_'), but dunder and class-private names are more complicated. Is there any built-in function that can tell me? If not, what are the internal rules Python uses?

For dunder, checking name.startswith('__') and name.endswith('__') doesn’t work because that would match '__' for example. Maybe a regex like ^__w+__$ would work?

For class-private, name.startswith('__') doesn’t work because dunder names aren’t mangled, nor are names with just underscores like '___'. So it seems like I’d have to check if the name starts with two underscores, doesn’t end with two underscores, and contains at least one non-underscore character. Is that right? In code:

name.startswith('__') and not name.endswith('__') and any(c != '_' for c in name)

I’m mostly concerned about the edge cases, so I want to make sure I get the rules 100% correct. I read What is the meaning of single and double underscore before an object name? but there’s not enough detail.

Asked By: wjandrea

||

Answers:

Dunder

Based on is_dunder_name in Objects/typeobject.c (using str.isascii from Python 3.7):

len(name) > 4 and name.isascii() and name.startswith('__') and name.endswith('__')

Alternatively, that regex ^__w+__$ would work, but it would need re.ASCII enabled to make sure w only matches ASCII characters.

Class-private

The rules are documented under Identifiers (Names):

name.startswith('__') and not name.endswith('__')

(Sidenote: not name.endswith('__') ensures that the name contains at least one non-underscore.)

There’s also a C implementation at _Py_Mangle in Python/compile.c, but it includes a check for a dot, when, strictly speaking, a name with a dot is an "attribute reference", not a name. That’d be equivalent to:

name.startswith('__') and not name.endswith('__') and not '.' in name

P.S. I can barely read C, so take these translations with a grain of salt.

Answered By: wjandrea