Python: How can I convert string to datetime without knowing the format?

Question:

I have a field that comes in as a string and represents a time. Sometimes its in 12 hour, sometimes in 24 hour. Possible values:

  1. 8:26
  2. 08:26am
  3. 13:27

Is there a function that will convert these to time format by being smart about it? Option 1 doesn’t have am because its in 24 hour format, while option 2 has a 0 before it and option 3 is obviously in 24 hour format. Is there a function in Python/ a lib that does:

time = func(str_time)
Asked By: Debnath Sinha

||

Answers:

there is one such function in pandas

import pandas as pd
d = pd.to_datetime('<date_string>')
Answered By: sachin saxena

super short answer:

from dateutil import parser
parser.parse("8:36pm")
>>>datetime.datetime(2015, 6, 26, 20, 36)
parser.parse("18:36")
>>>datetime.datetime(2015, 6, 26, 18, 36)

Dateutil should be available for your python installation; no need for something large like pandas

If you want to extract the time from the datetime object:

t = parser.parse("18:36").time()

which will give you a time object (if that’s of more help to you).
Or you can extract individual fields:

dt = parser.parse("18:36")
hours = dt.hour
minute = dt.minute
Answered By: Marcus Müller

Using regex to cut string into ['year', 'month', 'day', 'hour', 'minutes', 'seconds'] then unpack it and fill into datetime class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0) , this is the fastest way I tested so far.

    import re
    import pandas as pd
    import datetime
    import timeit

    def date2timestamp_anyformat(format_date):
        numbers = ''.join(re.findall(r'd+', format_date))
        if len(numbers) == 8:
            d = datetime.datetime(int(numbers[:4]), int(numbers[4:6]), int(numbers[6:8]))
        elif len(numbers) == 14:
            d = datetime.datetime(int(numbers[:4]), int(numbers[4:6]), int(numbers[6:8]), int(numbers[8:10]), int(numbers[10:12]), int(numbers[12:14]))
        elif len(numbers) > 14:
            d = datetime.datetime(int(numbers[:4]), int(numbers[4:6]), int(numbers[6:8]), int(numbers[8:10]), int(numbers[10:12]), int(numbers[12:14]), microsecond=1000*int(numbers[14:]))
        else:
            raise AssertionError(f'length not match:{format_date}')
        return d.timestamp()

and speed test:

    print('regex cut:n',timeit.timeit(lambda: datetime.datetime(*map(int, re.split('-|:|s', '2022-08-13 12:23:44.234')[:-1])).timestamp(), number=10000))
    print('pandas to_datetime:n', timeit.timeit(lambda: pd.to_datetime('2022-08-13 12:23:44.234').timestamp(), number=10000))
    print('datetime with known format:n',timeit.timeit(lambda: datetime.datetime.strptime('2022-08-13 12:23:44.234', '%Y-%m-%d %H:%M:%S.%f').timestamp(), number=10000))
    print('regex get number first:n',timeit.timeit(lambda: date2timestamp_anyformat('2022-08-13 12:23:44.234'), number=10000))
    print('dateutil parse:n', timeit.timeit(lambda: parser.parse('2022-08-13 12:23:44.234').timestamp(), number=10000))

result:

regex cut:
 0.040550945326685905
pandas to_datetime:
 0.8012433210387826
datetime with known format:
 0.09105705469846725
regex get number first:
 0.04557646345347166
dateutil parse:
 0.6404162347316742
Answered By: Eric_zhang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.