How can I simplify this conversion from underscore to camelcase in Python?
Question:
I have written the function below that converts underscore to camelcase with first word in lowercase, i.e. “get_this_value” -> “getThisValue”. Also I have requirement to preserve leading and trailing underscores and also double (triple etc.) underscores, if any, i.e.
"_get__this_value_" -> "_get_ThisValue_".
The code:
def underscore_to_camelcase(value):
output = ""
first_word_passed = False
for word in value.split("_"):
if not word:
output += "_"
continue
if first_word_passed:
output += word.capitalize()
else:
output += word.lower()
first_word_passed = True
return output
I am feeling the code above as written in non-Pythonic style, though it works as expected, so looking how to simplify the code and write it using list comprehensions etc.
Answers:
I think the code is fine. You’ve got a fairly complex specification, so if you insist on squashing it into the Procrustean bed of a list comprehension, then you’re likely to harm the clarity of the code.
The only changes I’d make would be:
- To use the
join
method to build the result in O(n) space and time, rather than repeated applications of +=
which is O(n²).
- To add a docstring.
Like this:
def underscore_to_camelcase(s):
"""Take the underscore-separated string s and return a camelCase
equivalent. Initial and final underscores are preserved, and medial
pairs of underscores are turned into a single underscore."""
def camelcase_words(words):
first_word_passed = False
for word in words:
if not word:
yield "_"
continue
if first_word_passed:
yield word.capitalize()
else:
yield word.lower()
first_word_passed = True
return ''.join(camelcase_words(s.split('_')))
Depending on the application, another change I would consider making would be to memoize the function. I presume you’re automatically translating source code in some way, and you expect the same names to occur many times. So you might as well store the conversion instead of re-computing it each time. An easy way to do that would be to use the @memoized
decorator from the Python decorator library.
I agree with Gareth that the code is ok. However, if you really want a shorter, yet readable approach you could try something like this:
def underscore_to_camelcase(value):
# Make a list of capitalized words and underscores to be preserved
capitalized_words = [w.capitalize() if w else '_' for w in value.split('_')]
# Convert the first word to lowercase
for i, word in enumerate(capitalized_words):
if word != '_':
capitalized_words[i] = word.lower()
break
# Join all words to a single string and return it
return "".join(capitalized_words)
This is the most compact way to do it:
def underscore_to_camelcase(value):
words = [word.capitalize() for word in value.split('_')]
words[0]=words[0].lower()
return "".join(words)
For regexp sake !
import re
def underscore_to_camelcase(value):
def rep(m):
if m.group(1) != None:
return m.group(2) + m.group(3).lower() + '_'
else:
return m.group(3).capitalize()
ret, nb_repl = re.subn(r'(^)?(_*)([a-zA-Z]+)', rep, value)
return ret if (nb_repl > 1) else ret[:-1]
The problem calls for a function that returns a lowercase word the first time, but capitalized words afterwards. You can do that with an if
clause, but then the if
clause has to be evaluated for every word. An appealing alternative is to use a generator. It can return one thing on the first call, and something else on successive calls, and it does not require as many if
s.
def lower_camelcase(seq):
it=iter(seq)
for word in it:
yield word.lower()
if word.isalnum(): break
for word in it:
yield word.capitalize()
def underscore_to_camelcase(text):
return ''.join(lower_camelcase(word if word else '_' for word in text.split('_')))
Here is some test code to show that it works:
tests=[('get__this_value','get_ThisValue'),
('_get__this_value','_get_ThisValue'),
('_get__this_value_','_get_ThisValue_'),
('get_this_value','getThisValue'),
('get__this__value','get_This_Value'),
]
for test,answer in tests:
result=underscore_to_camelcase(test)
try:
assert result==answer
except AssertionError:
print('{r!r} != {a!r}'.format(r=result,a=answer))
Another regexp solution:
import re
def conv(s):
"""Convert underscore-separated strings to camelCase equivalents.
>>> conv('get')
'get'
>>> conv('_get')
'_get'
>>> conv('get_this_value')
'getThisValue'
>>> conv('__get__this_value_')
'_get_ThisValue_'
>>> conv('_get__this_value__')
'_get_ThisValue_'
>>> conv('___get_this_value')
'_getThisValue'
"""
# convert case:
s = re.sub(r'(_*[A-Z])', lambda m: m.group(1).lower(), s.title(), count=1)
# remove/normalize underscores:
s = re.sub(r'__+|^_+|_+$', '|', s).replace('_', '').replace('|', '_')
return s
if __name__ == "__main__":
import doctest
doctest.testmod()
It works for your examples, but it might fail for names containting digits – it depends how you would capitalize them.
Here is a list comprehension style generator expression.
from itertools import count
def underscore_to_camelcase(value):
words = value.split('_')
counter = count()
return ''.join('_' if w == '' else w.capitalize() if counter.next() else w for w in words )
Your code is fine. The problem I think you’re trying to solve is that if first_word_passed
looks a little bit ugly.
One option for fixing this is a generator. We can easily make this return one thing for first entry and another for all subsequent entries. As Python has first-class functions we can get the generator to return the function we want to use to process each word.
We then just need to use the conditional operator so we can handle the blank entries returned by double underscores within a list comprehension.
So if we have a word we call the generator to get the function to use to set the case, and if we don’t we just use _
leaving the generator untouched.
def underscore_to_camelcase(value):
def camelcase():
yield str.lower
while True:
yield str.capitalize
c = camelcase()
return "".join(c.next()(x) if x else '_' for x in value.split("_"))
A slightly modified version:
import re
def underscore_to_camelcase(value):
first = True
res = []
for u,w in re.findall('([_]*)([^_]*)',value):
if first:
res.append(u+w)
first = False
elif len(w)==0: # trailing underscores
res.append(u)
else: # trim an underscore and capitalize
res.append(u[:-1] + w.title())
return ''.join(res)
This one works except for leaving the first word as lowercase.
def convert(word):
return ''.join(x.capitalize() or '_' for x in word.split('_'))
(I know this isn’t exactly what you asked for, and this thread is quite old, but since it’s quite prominent when searching for such conversions on Google I thought I’d add my solution in case it helps anyone else).
I prefer a regular expression, personally. Here’s one that is doing the trick for me:
import re
def to_camelcase(s):
return re.sub(r'(?!^)_([a-zA-Z])', lambda m: m.group(1).upper(), s)
Using unutbu
‘s tests:
tests = [('get__this_value', 'get_ThisValue'),
('_get__this_value', '_get_ThisValue'),
('_get__this_value_', '_get_ThisValue_'),
('get_this_value', 'getThisValue'),
('get__this__value', 'get_This_Value')]
for test, expected in tests:
assert to_camelcase(test) == expected
This algorithm performs well with digit:
import re
PATTERN = re.compile(r'''
(?<!A) # not at the start of the string
_
(?=[a-zA-Z]) # followed by a letter
''', re.X)
def camelize(value):
tokens = PATTERN.split(value)
response = tokens.pop(0).lower()
for remain in tokens:
response += remain.capitalize()
return response
Examples:
>>> camelize('Foo')
'foo'
>>> camelize('_Foo')
'_foo'
>>> camelize('Foo_')
'foo_'
>>> camelize('Foo_Bar')
'fooBar'
>>> camelize('Foo__Bar')
'foo_Bar'
>>> camelize('9')
'9'
>>> camelize('9_foo')
'9Foo'
>>> camelize('foo_9')
'foo_9'
>>> camelize('foo_9_bar')
'foo_9Bar'
>>> camelize('foo__9__bar')
'foo__9_Bar'
Here’s a simpler one. Might not be perfect for all situations, but it meets my requirements, since I’m just converting python variables, which have a specific format, to camel-case. This does capitalize all but the first word.
def underscore_to_camelcase(text):
"""
Converts underscore_delimited_text to camelCase.
Useful for JSON output
"""
return ''.join(word.title() if i else word for i, word in enumerate(text.split('_')))
Here’s mine, relying mainly on list comprehension, split, and join. Plus optional parameter to use different delimiter:
def underscore_to_camel(in_str, delim="_"):
chunks = in_str.split(delim)
chunks[1:] = [_.title() for _ in chunks[1:]]
return "".join(chunks)
Also, for sake of completeness, including what was referenced earlier as solution from another question as the reverse (NOT my own code, just repeating for easy reference):
first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def camel_to_underscore(in_str):
s1 = first_cap_re.sub(r'1_2', name)
return all_cap_re.sub(r'1_2', s1).lower()
I know this has already been answered, but I came up with some syntactic sugar that handles a special case that the selected answer does not (words with dunders in them i.e. “my_word__is_____ugly” to “myWordIsUgly”). Obviously this can be broken up into multiple lines but I liked the challenge of getting it on one. I added line breaks for clarity.
def underscore_to_camel(in_string):
return "".join(
list(
map(
lambda index_word:
index_word[1].lower() if index_word[0] == 0
else index_word[1][0].upper() + (index_word[1][1:] if len(index_word[1]) > 0 else ""),
list(enumerate(re.split(re.compile(r"_+"), in_string)
)
)
)
)
)
def convert(word):
if not isinstance(word, str):
return word
if word.startswith("_"):
word = word[1:]
words = word.split("_")
_words = []
for idx, _word in enumerate(words):
if idx == 0:
_words.append(_word)
continue
_words.append(_word.capitalize())
return ''.join(_words)
Maybe, pydash
works for this purpose (https://pydash.readthedocs.io/en/latest/)
>>> from pydash.strings import snake_case
>>>> snake_case('needToBeSnakeCased')
'get__this_value'
>>> from pydash.strings import camel_case
>>>camel_case('_get__this_value_')
'getThisValue'
I have written the function below that converts underscore to camelcase with first word in lowercase, i.e. “get_this_value” -> “getThisValue”. Also I have requirement to preserve leading and trailing underscores and also double (triple etc.) underscores, if any, i.e.
"_get__this_value_" -> "_get_ThisValue_".
The code:
def underscore_to_camelcase(value):
output = ""
first_word_passed = False
for word in value.split("_"):
if not word:
output += "_"
continue
if first_word_passed:
output += word.capitalize()
else:
output += word.lower()
first_word_passed = True
return output
I am feeling the code above as written in non-Pythonic style, though it works as expected, so looking how to simplify the code and write it using list comprehensions etc.
I think the code is fine. You’ve got a fairly complex specification, so if you insist on squashing it into the Procrustean bed of a list comprehension, then you’re likely to harm the clarity of the code.
The only changes I’d make would be:
- To use the
join
method to build the result in O(n) space and time, rather than repeated applications of+=
which is O(n²). - To add a docstring.
Like this:
def underscore_to_camelcase(s):
"""Take the underscore-separated string s and return a camelCase
equivalent. Initial and final underscores are preserved, and medial
pairs of underscores are turned into a single underscore."""
def camelcase_words(words):
first_word_passed = False
for word in words:
if not word:
yield "_"
continue
if first_word_passed:
yield word.capitalize()
else:
yield word.lower()
first_word_passed = True
return ''.join(camelcase_words(s.split('_')))
Depending on the application, another change I would consider making would be to memoize the function. I presume you’re automatically translating source code in some way, and you expect the same names to occur many times. So you might as well store the conversion instead of re-computing it each time. An easy way to do that would be to use the @memoized
decorator from the Python decorator library.
I agree with Gareth that the code is ok. However, if you really want a shorter, yet readable approach you could try something like this:
def underscore_to_camelcase(value):
# Make a list of capitalized words and underscores to be preserved
capitalized_words = [w.capitalize() if w else '_' for w in value.split('_')]
# Convert the first word to lowercase
for i, word in enumerate(capitalized_words):
if word != '_':
capitalized_words[i] = word.lower()
break
# Join all words to a single string and return it
return "".join(capitalized_words)
This is the most compact way to do it:
def underscore_to_camelcase(value):
words = [word.capitalize() for word in value.split('_')]
words[0]=words[0].lower()
return "".join(words)
For regexp sake !
import re
def underscore_to_camelcase(value):
def rep(m):
if m.group(1) != None:
return m.group(2) + m.group(3).lower() + '_'
else:
return m.group(3).capitalize()
ret, nb_repl = re.subn(r'(^)?(_*)([a-zA-Z]+)', rep, value)
return ret if (nb_repl > 1) else ret[:-1]
The problem calls for a function that returns a lowercase word the first time, but capitalized words afterwards. You can do that with an if
clause, but then the if
clause has to be evaluated for every word. An appealing alternative is to use a generator. It can return one thing on the first call, and something else on successive calls, and it does not require as many if
s.
def lower_camelcase(seq):
it=iter(seq)
for word in it:
yield word.lower()
if word.isalnum(): break
for word in it:
yield word.capitalize()
def underscore_to_camelcase(text):
return ''.join(lower_camelcase(word if word else '_' for word in text.split('_')))
Here is some test code to show that it works:
tests=[('get__this_value','get_ThisValue'),
('_get__this_value','_get_ThisValue'),
('_get__this_value_','_get_ThisValue_'),
('get_this_value','getThisValue'),
('get__this__value','get_This_Value'),
]
for test,answer in tests:
result=underscore_to_camelcase(test)
try:
assert result==answer
except AssertionError:
print('{r!r} != {a!r}'.format(r=result,a=answer))
Another regexp solution:
import re
def conv(s):
"""Convert underscore-separated strings to camelCase equivalents.
>>> conv('get')
'get'
>>> conv('_get')
'_get'
>>> conv('get_this_value')
'getThisValue'
>>> conv('__get__this_value_')
'_get_ThisValue_'
>>> conv('_get__this_value__')
'_get_ThisValue_'
>>> conv('___get_this_value')
'_getThisValue'
"""
# convert case:
s = re.sub(r'(_*[A-Z])', lambda m: m.group(1).lower(), s.title(), count=1)
# remove/normalize underscores:
s = re.sub(r'__+|^_+|_+$', '|', s).replace('_', '').replace('|', '_')
return s
if __name__ == "__main__":
import doctest
doctest.testmod()
It works for your examples, but it might fail for names containting digits – it depends how you would capitalize them.
Here is a list comprehension style generator expression.
from itertools import count
def underscore_to_camelcase(value):
words = value.split('_')
counter = count()
return ''.join('_' if w == '' else w.capitalize() if counter.next() else w for w in words )
Your code is fine. The problem I think you’re trying to solve is that if first_word_passed
looks a little bit ugly.
One option for fixing this is a generator. We can easily make this return one thing for first entry and another for all subsequent entries. As Python has first-class functions we can get the generator to return the function we want to use to process each word.
We then just need to use the conditional operator so we can handle the blank entries returned by double underscores within a list comprehension.
So if we have a word we call the generator to get the function to use to set the case, and if we don’t we just use _
leaving the generator untouched.
def underscore_to_camelcase(value):
def camelcase():
yield str.lower
while True:
yield str.capitalize
c = camelcase()
return "".join(c.next()(x) if x else '_' for x in value.split("_"))
A slightly modified version:
import re
def underscore_to_camelcase(value):
first = True
res = []
for u,w in re.findall('([_]*)([^_]*)',value):
if first:
res.append(u+w)
first = False
elif len(w)==0: # trailing underscores
res.append(u)
else: # trim an underscore and capitalize
res.append(u[:-1] + w.title())
return ''.join(res)
This one works except for leaving the first word as lowercase.
def convert(word):
return ''.join(x.capitalize() or '_' for x in word.split('_'))
(I know this isn’t exactly what you asked for, and this thread is quite old, but since it’s quite prominent when searching for such conversions on Google I thought I’d add my solution in case it helps anyone else).
I prefer a regular expression, personally. Here’s one that is doing the trick for me:
import re
def to_camelcase(s):
return re.sub(r'(?!^)_([a-zA-Z])', lambda m: m.group(1).upper(), s)
Using unutbu
‘s tests:
tests = [('get__this_value', 'get_ThisValue'),
('_get__this_value', '_get_ThisValue'),
('_get__this_value_', '_get_ThisValue_'),
('get_this_value', 'getThisValue'),
('get__this__value', 'get_This_Value')]
for test, expected in tests:
assert to_camelcase(test) == expected
This algorithm performs well with digit:
import re
PATTERN = re.compile(r'''
(?<!A) # not at the start of the string
_
(?=[a-zA-Z]) # followed by a letter
''', re.X)
def camelize(value):
tokens = PATTERN.split(value)
response = tokens.pop(0).lower()
for remain in tokens:
response += remain.capitalize()
return response
Examples:
>>> camelize('Foo')
'foo'
>>> camelize('_Foo')
'_foo'
>>> camelize('Foo_')
'foo_'
>>> camelize('Foo_Bar')
'fooBar'
>>> camelize('Foo__Bar')
'foo_Bar'
>>> camelize('9')
'9'
>>> camelize('9_foo')
'9Foo'
>>> camelize('foo_9')
'foo_9'
>>> camelize('foo_9_bar')
'foo_9Bar'
>>> camelize('foo__9__bar')
'foo__9_Bar'
Here’s a simpler one. Might not be perfect for all situations, but it meets my requirements, since I’m just converting python variables, which have a specific format, to camel-case. This does capitalize all but the first word.
def underscore_to_camelcase(text):
"""
Converts underscore_delimited_text to camelCase.
Useful for JSON output
"""
return ''.join(word.title() if i else word for i, word in enumerate(text.split('_')))
Here’s mine, relying mainly on list comprehension, split, and join. Plus optional parameter to use different delimiter:
def underscore_to_camel(in_str, delim="_"):
chunks = in_str.split(delim)
chunks[1:] = [_.title() for _ in chunks[1:]]
return "".join(chunks)
Also, for sake of completeness, including what was referenced earlier as solution from another question as the reverse (NOT my own code, just repeating for easy reference):
first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def camel_to_underscore(in_str):
s1 = first_cap_re.sub(r'1_2', name)
return all_cap_re.sub(r'1_2', s1).lower()
I know this has already been answered, but I came up with some syntactic sugar that handles a special case that the selected answer does not (words with dunders in them i.e. “my_word__is_____ugly” to “myWordIsUgly”). Obviously this can be broken up into multiple lines but I liked the challenge of getting it on one. I added line breaks for clarity.
def underscore_to_camel(in_string):
return "".join(
list(
map(
lambda index_word:
index_word[1].lower() if index_word[0] == 0
else index_word[1][0].upper() + (index_word[1][1:] if len(index_word[1]) > 0 else ""),
list(enumerate(re.split(re.compile(r"_+"), in_string)
)
)
)
)
)
def convert(word):
if not isinstance(word, str):
return word
if word.startswith("_"):
word = word[1:]
words = word.split("_")
_words = []
for idx, _word in enumerate(words):
if idx == 0:
_words.append(_word)
continue
_words.append(_word.capitalize())
return ''.join(_words)
Maybe, pydash
works for this purpose (https://pydash.readthedocs.io/en/latest/)
>>> from pydash.strings import snake_case
>>>> snake_case('needToBeSnakeCased')
'get__this_value'
>>> from pydash.strings import camel_case
>>>camel_case('_get__this_value_')
'getThisValue'