Detect a pattern and extract numeric values to then evaluate them and in certain cases replace their modifications in their original position

Question:

import re

input_text = "entre las 15 : hs -- 16:10 "  #Example 1
input_text = "entre las 21 :  -- 22"  #Example 2
input_text = "entre la 1 30 -- 2 "  #Example 3
input_text = "entre la 1 09 h.s. -- 6 : hs."  #Example 4
input_text = "entre la 1:50 -- 6 :"  #Example 5
input_text = "entre la 7 59 -- 23 : "  #Example 6
input_text = "entre la 10: -- : 10"  #Example 7
print(repr(input_text)) #print the output string

And this function fix_time_patterns_in_time_intervals() should be something like this, although you may have to use exceptions for possible index errors.
The function should only do the replacements if the hours (the first group) are less than 23, since there is no such thing as a 25th hour in a day. And in the case of minutes (the second group) the function should only make the replacements if the minutes are less than 59, since an hour cannot have more than 60 minutes and the 60th minute is already considered 0 and part of the next hour. Due to this limitation, the replacements should only be made under the conditions that the following conditionals pose within the function, otherwise it would only replace the same substring that was extracted from the original string.

def fix_time_patterns_in_time_intervals(match_num_time):
    hour_exist = False
    if(int(match_num_time[1]) <= 23):
        #do the replacement process
        if(len(match_num_time[1]) == 1): match_num_time[1] = "0"+ str(match_num_time[1])
        elif(len(match_num_time[1]) == 0): match_num_time[1] = "00"
        hour_exist = True
    if(int(match_num_time[2]) <= 59):
        #do the replacement process
        if(len(match_num_time[2]) == 1): match_num_time[2] = "0"+ str(match_num_time[2])
        elif(len(match_num_time[2]) == 0): match_num_time[2] = "00"
    elif( (int(match_num_time[2]) == None) and (hour_exist == True) ):
        #do the replacement process
        match_num_time[2] = "00"

    return match_num_time #the extracted substring

I think I could use regex capturing group match with re.group() or re.groups() method, and extract the first time mentioned the hours in the input string and then extract the other hour that appears in this string.

At the end you should be able to print the original string and object these results(output) in each of the examples respectively :

"entre las 15:00 hs -- 16:10 "  #Example 1
"entre las 21:00 -- 22:00"  #Example 2
"entre la 01:30 -- 02:00 "  #Example 3
"entre la 01:09 h.s. -- 06:00 hs."  #Example 4
"entre la 01:50 -- 06:00"  #Example 5
"entre la 07:59 -- 23:00"  #Example 6
"entre la 10:00 -- 00:10"  #Example 7

some additional examples of what time (hours:minutes) conversions should look like:

"6 :"      --->     "06:00"
"6:"       --->     "06:00"
"6"        --->     "06:00"
": 6"      --->     "00:06"
":6"       --->     "00:06"
": 16"     --->     "00:16"
":16"      --->     "00:16"
" 6"       --->     "06:00"
"15 : 1"   --->     "15:01"
"15 1"     --->     "15:01"
": 15"     --->     "00:15"
"0 15"     --->     "00:15"

I am having problems when extracting values to evaluate within the function fix_time_patterns_in_time_intervals() after identifying them with the regex, I hope you can help me with this.

Answers:

You can use this regex to match your time values:

(?=[:d])(?P<hour>d+)? *:? *(?P<minute>d+)?(?<! )

This matches:

  • (?=[:d]) : assert the string starts with a digit or a : – this ensures that we always start by matching the hour group if it is present
  • (?P<hour>d+)? : optional digits captured in the hour group
  • *:? * : an optional : surrounded by optional spaces
  • (?P<minute>d+)? : optional digits captured in the minutes group
  • (?<! ) : assert the string doesn’t end in a space so we don’t chew up spaces used for formatting

Regex demo on regex101

You can then use this replacement function to check for the existence of the match groups and (if the values are valid) reformat them with leading 0’s as required:

def fix_time_patterns_in_time_intervals(match_num_time):
    hour = int(match_num_time.group('hour') or '0')
    minute = int(match_num_time.group('minute') or '0')
    if hour > 23 or minute > 59:
        # invalid, don't convert
        return match_num_time.group(0)
    return f'{hour:02d}:{minute:02d}'

For your sample data (with a couple of invalid values):

times = [
    "entre las 15 : hs -- 16:10 ",
    "entre las 21 :  -- 22",
    "entre la 1 30 -- 2 ",
    "entre la 1 09 h.s. -- 6 : hs.",
    "entre la 25 0 -- 12:0",
    "entre las 13 64 -- 5",
    "entre la 1:50 -- 6 :",
    "entre la 7 59 -- 23 : ",
    "entre la 10: -- : 10"
]

regex = re.compile(r'(?=[:d])(?P<hour>d+)? *:? *(?P<minute>d+)?(?<! )')

for time in times:
    print(regex.sub(fix_time_patterns_in_time_intervals, time))

Output:

entre las 15:00 hs -- 16:10
entre las 21:00 -- 22:00
entre la 01:30 -- 02:00
entre la 01:09 h.s. -- 06:00 hs.
entre la 25 0 -- 12:00
entre las 13 64 -- 05:00
entre la 01:50 -- 06:00
entre la 07:59 -- 23:00
entre la 10:00 -- 00:10
Answered By: Nick