Lowercase based on regex pattern in Django & Python

Question:

The script I am using calls s_lower method to transform all text to lowercase but there is a catch: if it is a link (there is a special regex), then it does not lowercase it. So, I would like to apply the same or similar logic with other regex.

RE_WEBURL_NC = (
    r"(?:(?:(?:(?:https?):)//)(?:(?!(?:10|127)(?:.d{1,3}){3})(?!(?:169.254|192.168)(?:.d{1,3}){2})(?!172.(?:1["
    r"6-9]|2d|3[0-1])(?:.d{1,3}){2})(?:[1-9]d?|1dd|2[01]d|22[0-3])(?:.(?:1?d{1,2}|2[0-4]d|25[0-5])){2}(?:.(?"
    r":[1-9]d?|1dd|2[0-4]d|25[0-4]))|(?:(?:[a-z0-9][a-z0-9_-]{0,62})?[a-z0-9].)+(?:[a-z]{2,}.?))(?::d{2,5})?)(?:"
    r"(?:[/?#](?:(?![s"<>{}|\^~[]`])(?!&lt;|&gt;|&quot;|&#x27;).)*))?"
)

def s_lower(value):
    url_nc = re.compile(f"({RE_WEBURL_NC})")

    # Do not lowercase links
    if url_nc.search(value):
        substrings = url_nc.split(value)
        for idx, substr in enumerate(substrings):
            if not url_nc.match(substr):
                substrings[idx] = i18n_lower(substr)
        return "".join(substrings)

    return i18n_lower(value)

I want to lowercase all text other than text inside the special tags.

def s_lower(value):
    spec_nc = re.compile(r"[spec .*]") # this is for [spec some raNdoM cAsE text here]

    if spec_nc.search(value):
        substrings = spec_nc.split(value)
        for idx, substr in enumerate(substrings):
            if not spec_nc.match(substr):
                substrings[idx] = i18n_lower(substr)
        return "".join(substrings)

    return i18n_lower(value)
Asked By: TToprak1

||

Answers:

Was writing this as a comment, but it got too long…

You haven’t actually said what your problem is, but it looks like you’re missing the () around the regex (so that the split string ends up in substrings). It should be

spec_nc = re.compile(r"([spec .*])")

Note:

  • you should use [^]]* instead of .* to ensure your match stays within a single set of [].
  • you don’t really need to search, if the string is not present then split will simply return the original string in a single element list which you can still iterate
  • you don’t need the call to match; the strings which match the split regex will always be in the odd indexes of the list so you can just lower case dependent on idx

So you can simplify your code to:

def s_lower(value):
    spec_nc = re.compile(r"([spec [^]]*])") # this is for [spec some raNdoM cAsE text here]
    
    substrings = spec_nc.split(value)
    for idx, substr in enumerate(substrings):
        if idx % 2 == 0:
            substrings[idx] = i18n_lower(substr)
    return "".join(substrings)
Answered By: Nick
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.