Regex failing with text conversion to days – Python 3.10.x

Question:

I have a list of time durations in text, for example, ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']

I need to build a function to take these durations and instead come up with the total number of days.

The specific text could be a single day, days and hours, hours and minutes, a single set of minutes, or a day, hour, and minute.

I have tried the following:

def parse_dates(data):
    days = int(re.match(r'd+sDay', data)[0].split(' ')[0]) if re.match(r'd+sDay', data) is not None else 0
    hours = int(re.match(r'd+sHour', data)[0].split(' ')[0]) if re.match(r'^d+Hour*s$', data) is not None else 0
    minutes = int(re.match(r'd+sMinute', data)[0].split(' ')[0]) if re.match(r'd+sMinute', data) is not None else 0

    days += hours / 24
    days += minutes / 1440

    return days

The provided function fails regardless of using re.match() or re.search(), leading me to believe there is a problem with the expression itself.

However, the hours and minutes are ALWAYS showing as 0. How can I fix my regex, or devise a better solution, to parse these files appropriately?

Asked By: artemis

||

Answers:

You could try the following regex (Demo):

(?:(d+) Days?)?(?: ?(d+) Hours?)?(?: ?(d+) Minutes?)?

Explanation:

  • (?:...) marks a non-capturing group
  • (...) marks a captured group
  • ? after a symbol or group means it is optional
  • d+ means one or more digits (0123…)

Sample Python implementation:

import re

_DHM_RE = re.compile(r'(?:(d+) Days?)?(?: ?(d+) Hours?)?(?: ?(d+) Minutes?)?')
_HOURS_IN_DAY = 24
_MINUTES_IN_DAY = 60 * _HOURS_IN_DAY


def parse_dates(s: str) -> int:
    m = _DHM_RE.search(s)
    if m is None:
        return 0

    days = int(m.group(1) or 0)
    hours = int(m.group(2) or 0)
    minutes = int(m.group(3) or 0)

    days += hours / _HOURS_IN_DAY
    days += minutes / _MINUTES_IN_DAY

    return int(days)


strings = """
142 Days 16 Hours
128 Days 9 Hours 43 Minutes
10 Minutes
52 Hours
""".splitlines()

for s in strings:
    d = parse_dates(s)
    print(f'{s!r} has {d} days.')
Answered By: rv.kvetch

Here’s a way to do it:

import re
a = ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']
def parse_dates(data):
    x = [re.search('(d+)s' + unit, data) for unit in ['Day', 'Hour', 'Minute']]
    x = [0 if y is None else int(y.group(1)) for y in x]
    return x[0] + x[1] / 24 + x[2] / 1440
[print(parse_dates(data)) for data in a]

Output:

142.66666666666666
128.4048611111111
0.006944444444444444
Answered By: constantstranger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.