Turn camel case fields with numerical digits into snake case using Python

Question:

I have the below method in Python (3.10):

import re

def camel_to_screaming_snake(value):
    return '_'.join(re.findall('[A-Z][a-z]+|[0-9A-Z]+(?=[A-Z][a-z])|[0-9A-Z]{2,}|[a-z0-9]{2,}|[a-zA-Z0-9]'
                                        , value)).upper()

The goal I am trying to accomplish is to insert an underscore every time the case in a string changes, or between a non-numeric character and a numeric character. However, the following cases being passed in are not yielding expected results. (left is what is passed in, right is what is returned)

vn3b -> VN3B
vnRbb250V -> VN_RBB_250V

I am expecting the following return values:

vn3b -> VN_3_B
vnRbb250V -> VN_RBB_250_V

What is wrong with my regex preventing this from working as expected?

Asked By: work89

||

Answers:

You want to catch particular boundaries and insert an _ in them. To find a boundary, you can use the following regex format: (?<=<before>)(?=<after>). These are non capturing lookaheads and lookbehinds where before and after are replaced with the conditions.

In this case you have defined 3 ways that you might need to find a boundary:

  • A lowercase letter followed by an uppercase letter: (?<=[a-z])(?=[A-Z])
  • A number followed by a letter: (?<=[0-9])(?=[a-zA-Z])
  • A letter followed by a number: (?<=[a-zA-Z])(?=[0-9])

You can combine these into a single regex with an or | and then call sub on it to replace the boundaries. Afterwards you can uppercase the output:

import re

boundaries_re = re.compile(
    r"((?<=[a-z])(?=[A-Z]))"
    r"|((?<=[0-9])(?=[a-zA-Z]))"
    r"|((?<=[a-zA-Z])(?=[0-9]))"
)

def format_var(text):
    return boundaries_re.sub("_", text).upper()
>>> format_var("vn3b")
VN_3_B
>>> format_var("vnRbb250V")
VN_RBB_250_V
Answered By: flakes
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.