How do I convert a string to a valid variable name in Python?

Question:

I need to convert an arbitrary string to a string that is a valid variable name in Python.

Here’s a very basic example:

s1 = 'name/with/slashes'
s2 = 'name '

def clean(s):
    s = s.replace('/', '')
    s = s.strip()

    return s

# the _ is there so I can see the end of the string
print clean(s1) + '_'

That is a very naive approach. I need to check if the string contains invalid variable name characters and replace them with ”

What would be a pythonic way to do this?

Asked By: George Profenza

||

Answers:

You should build a regex that’s a whitelist of permissible characters and replace everything that is not in that character class.

Answered By: Daenyth

Use the re module, and strip all invalid charecters.

Answered By: John Howard

According to Python, an identifier is a letter or underscore, followed by an unlimited string of letters, numbers, and underscores:

import re

def clean(s):

   # Remove invalid characters
   s = re.sub('[^0-9a-zA-Z_]', '', s)

   # Remove leading characters until we find a letter or underscore
   s = re.sub('^[^a-zA-Z_]+', '', s)

   return s

Use like this:

>>> clean(' 32v2 g #Gmw845h$W b53wi ')
'v2gGmw845hWb53wi'
Answered By: Kenan Banks

Well, I’d like to best Triptych’s solution with … a one-liner!

>>> def clean(varStr): return re.sub('W|^(?=d)','_', varStr)
...

>>> clean('32v2 g #Gmw845h$W b53wi ')
'_32v2_g__Gmw845h_W_b53wi_'

This substitution replaces any non-variable appropriate character with underscore and inserts underscore in front if the string starts with a digit. IMO, ‘name/with/slashes’ looks better as variable name name_with_slashes than as namewithslashes.

Answered By: Nas Banov

You can use the built in func:str.isidentifier() in combination with filter().
This requires no imports such as re and works by iterating over each character and returning it if its an identifier. Then you just do a ''.join to convert the array to a string again.

s1 = 'name/with/slashes'
s2 = 'name '

def clean(s):
    s = ''.join(filter(str.isidentifier, s))
    return s

print f'{clean(s1)}_' #the _ is there so I can see the end of the string

EDIT:

If, like Hans Bouwmeester in the replies, want numeric values to be included as well, you can create a lambda which uses both the isIdentifier and the isdecimal functions to check the characters. Obviously this can be expanded as far as you want to take it. Code:

s1 = 'name/with/slashes'
s2 = 'name i2, i3    '
s3 = 'epng2 0-2g [ q4o 2-=2 t1  l32!@#$%*(vqv[r 0-34 2]] '

def clean(s):
    s = ''.join(filter( 
        lambda c: str.isidentifier(c) or str.isdecimal(c), s))
    return s
#the _ is there so I can see the end of the string
print(f'{ clean(s1) }_')
print(f'{ clean(s2) }_')
print(f'{ clean(s3) }_')

Gives :

namewithslashes_
namei2i3_
epng202gq4o22t1l32vqvr0342_
Answered By: ZXYNINE