Elegant Python function to convert CamelCase to snake_case?

Question

Example:

>>> convert('CamelCase')
'camel_case'

Asked By: Sridhar Ratnakumar

||

Source

Answer 1

Not in the standard library, but I found this module that appears to contain the functionality you need.

Answered By: Stefano Borini

Answer 2

For the fun of it:

>>> def un_camel(input):
...     output = [input[0].lower()]
...     for c in input[1:]:
...             if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
...                     output.append('_')
...                     output.append(c.lower())
...             else:
...                     output.append(c)
...     return str.join('', output)
...
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'

Or, more for the fun of it:

>>> un_camel = lambda i: i[0].lower() + str.join('', ("_" + c.lower() if c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" else c for c in i[1:]))
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'

Answered By: gahooa

Answer 3

''.join('_'+c.lower() if c.isupper() else c for c in "DeathToCamelCase").strip('_')
re.sub("(.)([A-Z])", r'1_2', 'DeathToCamelCase').lower()

Answered By: Jimmy

Answer 4

A horrendous example using regular expressions (you could easily clean this up 🙂 ):

def f(s):
    return s.group(1).lower() + "_" + s.group(2).lower()

p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(f, "CamelCase")
print p.sub(f, "getHTTPResponseCode")

Works for getHTTPResponseCode though!

Alternatively, using lambda:

p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "CamelCase")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "getHTTPResponseCode")

EDIT: It should also be pretty easy to see that there’s room for improvement for cases like “Test”, because the underscore is unconditionally inserted.

Answered By: Matthew Iselin

Answer 5

Here’s my solution:

def un_camel(text):
    """ Converts a CamelCase name into an under_score name. 

        >>> un_camel('CamelCase')
        'camel_case'
        >>> un_camel('getHTTPResponseCode')
        'get_http_response_code'
    """
    result = []
    pos = 0
    while pos < len(text):
        if text[pos].isupper():
            if pos-1 > 0 and text[pos-1].islower() or pos-1 > 0 and 
            pos+1 < len(text) and text[pos+1].islower():
                result.append("_%s" % text[pos].lower())
            else:
                result.append(text[pos].lower())
        else:
            result.append(text[pos])
        pos += 1
    return "".join(result)

It supports those corner cases discussed in the comments. For instance, it’ll convert getHTTPResponseCode to get_http_response_code like it should.

Answered By: Evan Fosmark

Answer 6

Camel case to snake case

import re

name = 'CamelCaseName'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name)  # camel_case_name

If you do this many times and the above is slow, compile the regex beforehand:

pattern = re.compile(r'(?<!^)(?=[A-Z])')
name = pattern.sub('_', name).lower()

To handle more advanced cases specially (this is not reversible anymore):

def camel_to_snake(name):
    name = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
    return re.sub('([a-z0-9])([A-Z])', r'1_2', name).lower()

print(camel_to_snake('camel2_camel2_case'))  # camel2_camel2_case
print(camel_to_snake('getHTTPResponseCode'))  # get_http_response_code
print(camel_to_snake('HTTPResponseCodeXYZ'))  # http_response_code_xyz

To add also cases with two underscores or more:

def to_snake_case(name):
    name = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
    name = re.sub('__([A-Z])', r'_1', name)
    name = re.sub('([a-z0-9])([A-Z])', r'1_2', name)
    return name.lower()

Snake case to pascal case

name = 'snake_case_name'
name = ''.join(word.title() for word in name.split('_'))
print(name)  # SnakeCaseName

Answered By: epost

Answer 7

Wow I just stole this from django snippets. ref http://djangosnippets.org/snippets/585/

Pretty elegant

camelcase_to_underscore = lambda str: re.sub(r'(?<=[a-z])[A-Z]|[A-Z](?=[^A-Z])', r'_g<0>', str).lower().strip('_')

Example:

camelcase_to_underscore('ThisUser')

Returns:

'this_user'

REGEX DEMO

Answered By: brianray

Answer 8

Very nice RegEx proposed on this site:

(?<!^)(?=[A-Z])

If python have a String Split method, it should work…

In Java:

String s = "loremIpsum";
words = s.split("(?&#60;!^)(?=[A-Z])");

Answered By: Jmini

Answer 9

Here’s something I did to change the headers on a tab-delimited file. I’m omitting the part where I only edited the first line of the file. You could adapt it to Python pretty easily with the re library. This also includes separating out numbers (but keeps the digits together). I did it in two steps because that was easier than telling it not to put an underscore at the start of a line or tab.

Step One…find uppercase letters or integers preceded by lowercase letters, and precede them with an underscore:

Search:

([a-z]+)([A-Z]|[0-9]+)

Replacement:

1_l2/

Step Two…take the above and run it again to convert all caps to lowercase:

Search:

([A-Z])

Replacement (that’s backslash, lowercase L, backslash, one):

l1

Answered By: Joe Tricarico

Answer 10

If you use Google’s (nearly) deterministic Camel case algorithm, then one does not need to handle things like HTMLDocument since it should be HtmlDocument, then this regex based approach is simple. It replace all capitals or numbers with an underscore. Note does not handle multi digit numbers.

import re

def to_snake_case(camel_str):
    return re.sub('([A-Z0-9])', r'_1', camel_str).lower().lstrip('_')

Answered By: codekoala

Answer 11

I don’t know why these are all so complicating.

for most cases, the simple expression ([A-Z]+) will do the trick

>>> re.sub('([A-Z]+)', r'_1','CamelCase').lower()
'_camel_case'  
>>> re.sub('([A-Z]+)', r'_1','camelCase').lower()
'camel_case'
>>> re.sub('([A-Z]+)', r'_1','camel2Case2').lower()
'camel2_case2'
>>> re.sub('([A-Z]+)', r'_1','camelCamelCase').lower()
'camel_camel_case'
>>> re.sub('([A-Z]+)', r'_1','getHTTPResponseCode').lower()
'get_httpresponse_code'

To ignore the first character simply add look behind (?!^)

>>> re.sub('(?!^)([A-Z]+)', r'_1','CamelCase').lower()
'camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_1','CamelCamelCase').lower()
'camel_camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_1','Camel2Camel2Case').lower()
'camel2_camel2_case'
>>> re.sub('(?!^)([A-Z]+)', r'_1','getHTTPResponseCode').lower()
'get_httpresponse_code'

If you want to separate ALLCaps to all_caps and expect numbers in your string you still don’t need to do two separate runs just use | This expression ((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z])) can handle just about every scenario in the book

>>> a = re.compile('((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z]))')
>>> a.sub(r'_1', 'getHTTPResponseCode').lower()
'get_http_response_code'
>>> a.sub(r'_1', 'get2HTTPResponseCode').lower()
'get2_http_response_code'
>>> a.sub(r'_1', 'get2HTTPResponse123Code').lower()
'get2_http_response123_code'
>>> a.sub(r'_1', 'HTTPResponseCode').lower()
'http_response_code'
>>> a.sub(r'_1', 'HTTPResponseCodeXYZ').lower()
'http_response_code_xyz'

It all depends on what you want so use the solution that best suits your needs as it should not be overly complicated.

nJoy!

Answered By: nickl-

Answer 12

I don’t get idea why using both .sub() calls? 🙂 I’m not regex guru, but I simplified function to this one, which is suitable for my certain needs, I just needed a solution to convert camelCasedVars from POST request to vars_with_underscore:

def myFunc(...):
  return re.sub('(.)([A-Z]{1})', r'1_2', "iTriedToWriteNicely").lower()

It does not work with such names like getHTTPResponse, cause I heard it is bad naming convention (should be like getHttpResponse, it’s obviously, that it’s much easier memorize this form).

Answered By: desper4do

Answer 13

I was looking for a solution to the same problem, except that I needed a chain; e.g.

"CamelCamelCamelCase" -> "Camel-camel-camel-case"

Starting from the nice two-word solutions here, I came up with the following:

"-".join(x.group(1).lower() if x.group(2) is None else x.group(1) 
         for x in re.finditer("((^.[^A-Z]+)|([A-Z][^A-Z]+))", "stringToSplit"))

Most of the complicated logic is to avoid lowercasing the first word. Here’s a simpler version if you don’t mind altering the first word:

"-".join(x.group(1).lower() for x in re.finditer("(^[^A-Z]+|[A-Z][^A-Z]+)", "stringToSplit"))

Of course, you can pre-compile the regular expressions or join with underscore instead of hyphen, as discussed in the other solutions.

Answered By: Jim Pivarski

Answer 14

This is not a elegant method, is a very ‘low level’ implementation of a simple state machine (bitfield state machine), possibly the most anti pythonic mode to resolve this, however re module also implements a too complex state machine to resolve this simple task, so i think this is a good solution.

def splitSymbol(s):
    si, ci, state = 0, 0, 0 # start_index, current_index 
    '''
        state bits:
        0: no yields
        1: lower yields
        2: lower yields - 1
        4: upper yields
        8: digit yields
        16: other yields
        32 : upper sequence mark
    '''
    for c in s:

        if c.islower():
            if state & 1:
                yield s[si:ci]
                si = ci
            elif state & 2:
                yield s[si:ci - 1]
                si = ci - 1
            state = 4 | 8 | 16
            ci += 1

        elif c.isupper():
            if state & 4:
                yield s[si:ci]
                si = ci
            if state & 32:
                state = 2 | 8 | 16 | 32
            else:
                state = 8 | 16 | 32

            ci += 1

        elif c.isdigit():
            if state & 8:
                yield s[si:ci]
                si = ci
            state = 1 | 4 | 16
            ci += 1

        else:
            if state & 16:
                yield s[si:ci]
            state = 0
            ci += 1  # eat ci
            si = ci   
        print(' : ', c, bin(state))
    if state:
        yield s[si:ci] 


def camelcaseToUnderscore(s):
    return '_'.join(splitSymbol(s))

splitsymbol can parses all case types: UpperSEQUENCEInterleaved, under_score, BIG_SYMBOLS and cammelCasedMethods

I hope it is useful

Answered By: jdavidls

Answer 15

There’s an inflection library in the package index that can handle these things for you. In this case, you’d be looking for inflection.underscore():

>>> inflection.underscore('CamelCase')
'camel_case'

Answered By: Brad Koch

Answer 16

Personally I am not sure how anything using regular expressions in python can be described as elegant. Most answers here are just doing “code golf” type RE tricks. Elegant coding is supposed to be easily understood.

def to_snake_case(not_snake_case):
    final = ''
    for i in xrange(len(not_snake_case)):
        item = not_snake_case[i]
        if i < len(not_snake_case) - 1:
            next_char_will_be_underscored = (
                not_snake_case[i+1] == "_" or
                not_snake_case[i+1] == " " or
                not_snake_case[i+1].isupper()
            )
        if (item == " " or item == "_") and next_char_will_be_underscored:
            continue
        elif (item == " " or item == "_"):
            final += "_"
        elif item.isupper():
            final += "_"+item.lower()
        else:
            final += item
    if final[0] == "_":
        final = final[1:]
    return final

>>> to_snake_case("RegularExpressionsAreFunky")
'regular_expressions_are_funky'

>>> to_snake_case("RegularExpressionsAre Funky")
'regular_expressions_are_funky'

>>> to_snake_case("RegularExpressionsAre_Funky")
'regular_expressions_are_funky'

Answered By: TehTris

Answer 17

Concise without regular expressions, but HTTPResponseCode=> httpresponse_code:

def from_camel(name):
    """
    ThisIsCamelCase ==> this_is_camel_case
    """
    name = name.replace("_", "")
    _cas = lambda _x : [_i.isupper() for _i in _x]
    seq = zip(_cas(name[1:-1]), _cas(name[2:]))
    ss = [_x + 1 for _x, (_i, _j) in enumerate(seq) if (_i, _j) == (False, True)]
    return "".join([ch + "_" if _x in ss else ch for _x, ch in numerate(name.lower())])

Answered By: Dantalion

Answer 18

Using regexes may be the shortest, but this solution is way more readable:

def to_snake_case(s):
    snake = "".join(["_"+c.lower() if c.isupper() else c for c in s])
    return snake[1:] if snake.startswith("_") else snake

Answered By: 3k-

Answer 19

Use: str.capitalize() to convert first letter of the string (contained in variable str) to a capital letter and returns the entire string.

Example:
Command: “hello”.capitalize()
Output: Hello

Answered By: Arshin

Answer 20

Without any library :

def camelify(out):
    return (''.join(["_"+x.lower() if i<len(out)-1 and x.isupper() and out[i+1].islower()
         else x.lower()+"_" if i<len(out)-1 and x.islower() and out[i+1].isupper()
         else x.lower() for i,x in enumerate(list(out))])).lstrip('_').replace('__','_')

A bit heavy, but

CamelCamelCamelCase ->  camel_camel_camel_case
HTTPRequest         ->  http_request
GetHTTPRequest      ->  get_http_request
getHTTPRequest      ->  get_http_request

Answered By: bibmartin

Answer 21

This simple method should do the job:

import re

def convert(name):
    return re.sub(r'([A-Z]*)([A-Z][a-z]+)', lambda x: (x.group(1) + '_' if x.group(1) else '') + x.group(2) + '_', name).rstrip('_').lower()

We look for capital letters that are precedeed by any number of (or zero) capital letters, and followed by any number of lowercase characters.
An underscore is placed just before the occurence of the last capital letter found in the group, and one can be placed before that capital letter in case it is preceded by other capital letters.
If there are trailing underscores, remove those.
Finally, the whole result string is changed to lower case.

(taken from here, see working example online)

Answered By: Mathieu Rodic

Answer 22

Lightely adapted from https://stackoverflow.com/users/267781/matth
who use generators.

def uncamelize(s):
    buff, l = '', []
    for ltr in s:
        if ltr.isupper():
            if buff:
                l.append(buff)
                buff = ''
        buff += ltr
    l.append(buff)
    return '_'.join(l).lower()

Answered By: Salvatore

Answer 23

Take a look at the excellent Schematics lib

https://github.com/schematics/schematics

It allows you to created typed data structures that can serialize/deserialize from python to Javascript flavour, eg:

class MapPrice(Model):
    price_before_vat = DecimalType(serialized_name='priceBeforeVat')
    vat_rate = DecimalType(serialized_name='vatRate')
    vat = DecimalType()
    total_price = DecimalType(serialized_name='totalPrice')

Answered By: Iain Hunter

Answer 24

stringcase is my go-to library for this; e.g.:

>>> from stringcase import pascalcase, snakecase
>>> snakecase('FooBarBaz')
'foo_bar_baz'
>>> pascalcase('foo_bar_baz')
'FooBarBaz'

Answered By: Beau

Answer 25

Avoiding libraries and regular expressions:

def camel_to_snake(s):
    return ''.join(['_'+c.lower() if c.isupper() else c for c in s]).lstrip('_')

>>> camel_to_snake('ThisIsMyString')
'this_is_my_string'

Answered By: otocan

Answer 26

def convert(name):
    return reduce(
        lambda x, y: x + ('_' if y.isupper() else '') + y, 
        name
    ).lower()

And if we need to cover a case with already-un-cameled input:

def convert(name):
    return reduce(
        lambda x, y: x + ('_' if y.isupper() and not x.endswith('_') else '') + y, 
        name
    ).lower()

Answered By: dmrz

Answer 27

I think this solution is more straightforward than previous answers:

import re

def convert (camel_input):
    words = re.findall(r'[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|d|W|$)|d+', camel_input)
    return '_'.join(map(str.lower, words))


# Let's test it
test_strings = [
    'CamelCase',
    'camelCamelCase',
    'Camel2Camel2Case',
    'getHTTPResponseCode',
    'get200HTTPResponseCode',
    'getHTTP200ResponseCode',
    'HTTPResponseCode',
    'ResponseHTTP',
    'ResponseHTTP2',
    'Fun?!awesome',
    'Fun?!Awesome',
    '10CoolDudes',
    '20coolDudes'
]
for test_string in test_strings:
    print(convert(test_string))

Which outputs:

camel_case
camel_camel_case
camel_2_camel_2_case
get_http_response_code
get_200_http_response_code
get_http_200_response_code
http_response_code
response_http
response_http_2
fun_awesome
fun_awesome
10_cool_dudes
20_cool_dudes

The regular expression matches three patterns:

[A-Z]?[a-z]+: Consecutive lower-case letters that optionally start with an upper-case letter.
[A-Z]{2,}(?=[A-Z][a-z]|d|W|$): Two or more consecutive upper-case letters. It uses a lookahead to exclude the last upper-case letter if it is followed by a lower-case letter.
d+: Consecutive numbers.

By using re.findall we get a list of individual “words” that can be converted to lower-case and joined with underscores.

Answered By: rspeed

Answer 28

def convert(camel_str):
    temp_list = []
    for letter in camel_str:
        if letter.islower():
            temp_list.append(letter)
        else:
            temp_list.append('_')
            temp_list.append(letter)
    result = "".join(temp_list)
    return result.lower()

Answered By: JohnBoy

Answer 29

Just in case someone needs to transform a complete source file, here is a script that will do it.

# Copy and paste your camel case code in the string below
camelCaseCode ="""
    cv2.Matx33d ComputeZoomMatrix(const cv2.Point2d & zoomCenter, double zoomRatio)
    {
      auto mat = cv2.Matx33d::eye();
      mat(0, 0) = zoomRatio;
      mat(1, 1) = zoomRatio;
      mat(0, 2) = zoomCenter.x * (1. - zoomRatio);
      mat(1, 2) = zoomCenter.y * (1. - zoomRatio);
      return mat;
    }
"""

import re
def snake_case(name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'1_2', name)
    return re.sub('([a-z0-9])([A-Z])', r'1_2', s1).lower()

def lines(str):
    return str.split("n")

def unlines(lst):
    return "n".join(lst)

def words(str):
    return str.split(" ")

def unwords(lst):
    return " ".join(lst)

def map_partial(function):
    return lambda values : [  function(v) for v in values]

import functools
def compose(*functions):
    return functools.reduce(lambda f, g: lambda x: f(g(x)), functions, lambda x: x)

snake_case_code = compose(
    unlines ,
    map_partial(unwords),
    map_partial(map_partial(snake_case)),
    map_partial(words),
    lines
)
print(snake_case_code(camelCaseCode))

Answered By: Pascal T.

Answer 30

So many complicated methods…
Just find all “Titled” group and join its lower cased variant with underscore.

>>> import re
>>> def camel_to_snake(string):
...     groups = re.findall('([A-z0-9][a-z]*)', string)
...     return '_'.join([i.lower() for i in groups])
...
>>> camel_to_snake('ABCPingPongByTheWay2KWhereIsOurBorderlands3???')
'a_b_c_ping_pong_by_the_way_2_k_where_is_our_borderlands_3'

If you don’t want make numbers like first character of group or separate group – you can use ([A-z][a-z0-9]*) mask.

Answered By: unitto

Elegant Python function to convert CamelCase to snake_case?

Question:

Answers:

Camel case to snake case

Snake case to pascal case