How can I simplify this conversion from underscore to camelcase in Python?

Question:

I have written the function below that converts underscore to camelcase with first word in lowercase, i.e. “get_this_value” -> “getThisValue”. Also I have requirement to preserve leading and trailing underscores and also double (triple etc.) underscores, if any, i.e.

"_get__this_value_" -> "_get_ThisValue_".

The code:

def underscore_to_camelcase(value):
    output = ""
    first_word_passed = False
    for word in value.split("_"):
        if not word:
            output += "_"
            continue
        if first_word_passed:
            output += word.capitalize()
        else:
            output += word.lower()
        first_word_passed = True
    return output

I am feeling the code above as written in non-Pythonic style, though it works as expected, so looking how to simplify the code and write it using list comprehensions etc.

Asked By: Serge Tarkovski

||

Answers:

I think the code is fine. You’ve got a fairly complex specification, so if you insist on squashing it into the Procrustean bed of a list comprehension, then you’re likely to harm the clarity of the code.

The only changes I’d make would be:

  1. To use the join method to build the result in O(n) space and time, rather than repeated applications of += which is O(n²).
  2. To add a docstring.

Like this:

def underscore_to_camelcase(s):
    """Take the underscore-separated string s and return a camelCase
    equivalent.  Initial and final underscores are preserved, and medial
    pairs of underscores are turned into a single underscore."""
    def camelcase_words(words):
        first_word_passed = False
        for word in words:
            if not word:
                yield "_"
                continue
            if first_word_passed:
                yield word.capitalize()
            else:
                yield word.lower()
            first_word_passed = True
    return ''.join(camelcase_words(s.split('_')))

Depending on the application, another change I would consider making would be to memoize the function. I presume you’re automatically translating source code in some way, and you expect the same names to occur many times. So you might as well store the conversion instead of re-computing it each time. An easy way to do that would be to use the @memoized decorator from the Python decorator library.

Answered By: Gareth Rees

I agree with Gareth that the code is ok. However, if you really want a shorter, yet readable approach you could try something like this:

def underscore_to_camelcase(value):
    # Make a list of capitalized words and underscores to be preserved
    capitalized_words = [w.capitalize() if w else '_' for w in value.split('_')]

    # Convert the first word to lowercase
    for i, word in enumerate(capitalized_words):
        if word != '_':
            capitalized_words[i] = word.lower()
            break

    # Join all words to a single string and return it
    return "".join(capitalized_words)
Answered By: Pär Wieslander

This is the most compact way to do it:

def underscore_to_camelcase(value):
    words = [word.capitalize() for word in value.split('_')]
    words[0]=words[0].lower()
    return "".join(words)
Answered By: vonPetrushev

For regexp sake !

import re

def underscore_to_camelcase(value):
    def rep(m):
        if m.group(1) != None:
            return m.group(2) + m.group(3).lower() + '_'
        else:
            return m.group(3).capitalize()

    ret, nb_repl = re.subn(r'(^)?(_*)([a-zA-Z]+)', rep, value)
    return ret if (nb_repl > 1) else ret[:-1]
Answered By: Jocelyn delalande

The problem calls for a function that returns a lowercase word the first time, but capitalized words afterwards. You can do that with an if clause, but then the if clause has to be evaluated for every word. An appealing alternative is to use a generator. It can return one thing on the first call, and something else on successive calls, and it does not require as many ifs.

def lower_camelcase(seq):
    it=iter(seq)
    for word in it:
        yield word.lower()
        if word.isalnum(): break
    for word in it:
        yield word.capitalize()

def underscore_to_camelcase(text):
    return ''.join(lower_camelcase(word if word else '_' for word in text.split('_')))

Here is some test code to show that it works:

tests=[('get__this_value','get_ThisValue'),
       ('_get__this_value','_get_ThisValue'),
       ('_get__this_value_','_get_ThisValue_'),
       ('get_this_value','getThisValue'),        
       ('get__this__value','get_This_Value'),        
       ]
for test,answer in tests:
    result=underscore_to_camelcase(test)
    try:
        assert result==answer
    except AssertionError:
        print('{r!r} != {a!r}'.format(r=result,a=answer))
Answered By: unutbu

Another regexp solution:

import re

def conv(s):
    """Convert underscore-separated strings to camelCase equivalents.

    >>> conv('get')
    'get'
    >>> conv('_get')
    '_get'
    >>> conv('get_this_value')
    'getThisValue'
    >>> conv('__get__this_value_')
    '_get_ThisValue_'
    >>> conv('_get__this_value__')
    '_get_ThisValue_'
    >>> conv('___get_this_value')
    '_getThisValue'

    """
    # convert case:
    s = re.sub(r'(_*[A-Z])', lambda m: m.group(1).lower(), s.title(), count=1)
    # remove/normalize underscores:
    s = re.sub(r'__+|^_+|_+$', '|', s).replace('_', '').replace('|', '_')
    return s

if __name__ == "__main__":
    import doctest
    doctest.testmod()

It works for your examples, but it might fail for names containting digits – it depends how you would capitalize them.

Answered By: Oben Sonne

Here is a list comprehension style generator expression.

from itertools import count
def underscore_to_camelcase(value):
    words = value.split('_')
    counter = count()
    return ''.join('_' if w == '' else w.capitalize() if counter.next() else w for w in words )
Answered By: kevpie

Your code is fine. The problem I think you’re trying to solve is that if first_word_passed looks a little bit ugly.

One option for fixing this is a generator. We can easily make this return one thing for first entry and another for all subsequent entries. As Python has first-class functions we can get the generator to return the function we want to use to process each word.

We then just need to use the conditional operator so we can handle the blank entries returned by double underscores within a list comprehension.

So if we have a word we call the generator to get the function to use to set the case, and if we don’t we just use _ leaving the generator untouched.

def underscore_to_camelcase(value):
    def camelcase(): 
        yield str.lower
        while True:
            yield str.capitalize

    c = camelcase()
    return "".join(c.next()(x) if x else '_' for x in value.split("_"))
Answered By: Dave Webb

A slightly modified version:

import re

def underscore_to_camelcase(value):
    first = True
    res = []

    for u,w in re.findall('([_]*)([^_]*)',value):
        if first:
            res.append(u+w)
            first = False
        elif len(w)==0:    # trailing underscores
            res.append(u)
        else:   # trim an underscore and capitalize
            res.append(u[:-1] + w.title())

    return ''.join(res)
Answered By: Hugh Bothwell

This one works except for leaving the first word as lowercase.

def convert(word):
    return ''.join(x.capitalize() or '_' for x in word.split('_'))

(I know this isn’t exactly what you asked for, and this thread is quite old, but since it’s quite prominent when searching for such conversions on Google I thought I’d add my solution in case it helps anyone else).

Answered By: Siegfried Gevatter

I prefer a regular expression, personally. Here’s one that is doing the trick for me:

import re
def to_camelcase(s):
    return re.sub(r'(?!^)_([a-zA-Z])', lambda m: m.group(1).upper(), s)

Using unutbu‘s tests:

tests = [('get__this_value', 'get_ThisValue'),
         ('_get__this_value', '_get_ThisValue'),
         ('_get__this_value_', '_get_ThisValue_'),
         ('get_this_value', 'getThisValue'),
         ('get__this__value', 'get_This_Value')]

for test, expected in tests:
    assert to_camelcase(test) == expected
Answered By: codekoala

This algorithm performs well with digit:

import re

PATTERN = re.compile(r'''
    (?<!A) # not at the start of the string
    _
    (?=[a-zA-Z]) # followed by a letter
    ''', re.X)

def camelize(value):
    tokens = PATTERN.split(value)
    response = tokens.pop(0).lower()
    for remain in tokens:
        response += remain.capitalize()
    return response

Examples:

>>> camelize('Foo')
'foo'
>>> camelize('_Foo')
'_foo'
>>> camelize('Foo_')
'foo_'
>>> camelize('Foo_Bar')
'fooBar'
>>> camelize('Foo__Bar')
'foo_Bar'
>>> camelize('9')
'9'
>>> camelize('9_foo')
'9Foo'
>>> camelize('foo_9')
'foo_9'
>>> camelize('foo_9_bar')
'foo_9Bar'
>>> camelize('foo__9__bar')
'foo__9_Bar'
Answered By: Xavier Barbosa

Here’s a simpler one. Might not be perfect for all situations, but it meets my requirements, since I’m just converting python variables, which have a specific format, to camel-case. This does capitalize all but the first word.

def underscore_to_camelcase(text):
"""
Converts underscore_delimited_text to camelCase.
Useful for JSON output
"""
    return ''.join(word.title() if i else word for i, word in enumerate(text.split('_')))
Answered By: Cari

Here’s mine, relying mainly on list comprehension, split, and join. Plus optional parameter to use different delimiter:

def underscore_to_camel(in_str, delim="_"):
    chunks = in_str.split(delim)
    chunks[1:] = [_.title() for _ in chunks[1:]]
    return "".join(chunks)

Also, for sake of completeness, including what was referenced earlier as solution from another question as the reverse (NOT my own code, just repeating for easy reference):

first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def camel_to_underscore(in_str):
    s1 = first_cap_re.sub(r'1_2', name)
    return all_cap_re.sub(r'1_2', s1).lower()
Answered By: Dave

I know this has already been answered, but I came up with some syntactic sugar that handles a special case that the selected answer does not (words with dunders in them i.e. “my_word__is_____ugly” to “myWordIsUgly”). Obviously this can be broken up into multiple lines but I liked the challenge of getting it on one. I added line breaks for clarity.

def underscore_to_camel(in_string):
return "".join(
    list(
        map(
            lambda index_word:
                index_word[1].lower() if index_word[0] == 0
                else index_word[1][0].upper() + (index_word[1][1:] if len(index_word[1]) > 0 else ""),
            list(enumerate(re.split(re.compile(r"_+"), in_string)
                           )
                 )
        )
    )
)
Answered By: Connor
    def convert(word):
        if not isinstance(word, str):
            return word
        if word.startswith("_"):
            word = word[1:]
        words = word.split("_")
        _words = []
        for idx, _word in enumerate(words):
            if idx == 0:
                _words.append(_word)
                continue
            _words.append(_word.capitalize())
        return ''.join(_words)
Answered By: Vivek Anand

Maybe, pydash works for this purpose (https://pydash.readthedocs.io/en/latest/)

>>> from pydash.strings import snake_case 
>>>> snake_case('needToBeSnakeCased')
'get__this_value'
>>> from pydash.strings import camel_case
>>>camel_case('_get__this_value_')
'getThisValue'
Answered By: Evgenii
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.