python strptime format with optional bits
Question:
Right now I have:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
This works great unless I’m converting a string that doesn’t have the microseconds. How can I specify that the microseconds are optional (and should be considered 0 if they aren’t in the string)?
Answers:
You could use a try/except
block:
try:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
except ValueError:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')
What about just appending it if it doesn’t exist?
if '.' not in date_string:
date_string = date_string + '.0'
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
I prefer using regex matches instead of try and except. This allows for many fallbacks of acceptable formats.
# full timestamp with milliseconds
match = re.match(r"d{4}-d{2}-d{2}Td{2}:d{2}:d{2}.d+Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%S.%fZ")
# timestamp missing milliseconds
match = re.match(r"d{4}-d{2}-d{2}Td{2}:d{2}:d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%SZ")
# timestamp missing milliseconds & seconds
match = re.match(r"d{4}-d{2}-d{2}Td{2}:d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%MZ")
# unknown timestamp format
return false
Don’t forget to import “re” as well as “datetime” for this method.
datetime(*map(int, re.findall('d+', date_string)))
can parse both '%Y-%m-%d %H:%M:%S.%f'
and '%Y-%m-%d %H:%M:%S'
. It is too permissive if your input is not filtered.
It is quick-and-dirty but sometimes strptime()
is too slow. It can be used if you know that the input has the expected date format.
For my similar problem using jq
I used the following:
|split("Z")[0]|split(".")[0]|strptime("%Y-%m-%dT%H:%M:%S")|mktime
As the solution to sort my list by time properly.
using one regular expression and some list expressions
time_str = "12:34.567"
# time format is [HH:]MM:SS[.FFF]
sum([a*b for a,b in zip(map(lambda x: int(x) if x else 0, re.match(r"(?:(d{2}):)?(d{2}):(d{2})(?:.(d{3}))?", time_str).groups()), [3600, 60, 1, 1/1000])])
# result = 754.567
I’m late to the party but I found if you don’t care about the optional bits this will lop off the .%f
for you.
datestring.split('.')[0]
If you are using Pandas you can also filter the the Series and concatenate it. The index is automatically joined.
import pandas as pd
# Every other row has a different format
df = pd.DataFrame({"datetime_string": ["21-06-08 14:36:09", "21-06-08 14:36:09.50", "21-06-08 14:36:10", "21-06-08 14:36:10.50"]})
df["datetime"] = pd.concat([
pd.to_datetime(df["datetime_string"].iloc[1::2], format="%y-%m-%d %H:%M:%S.%f"),
pd.to_datetime(df["datetime_string"].iloc[::2], format="%y-%m-%d %H:%M:%S"),
])
datetime_string
datetime
0
21-06-08 14:36:09
2021-06-08 14:36:09
1
21-06-08 14:36:09.50
2021-06-08 14:36:09.500000
2
21-06-08 14:36:10
2021-06-08 14:36:10
3
21-06-08 14:36:10.50
2021-06-08 14:36:10.500000
Right now I have:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
This works great unless I’m converting a string that doesn’t have the microseconds. How can I specify that the microseconds are optional (and should be considered 0 if they aren’t in the string)?
You could use a try/except
block:
try:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
except ValueError:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')
What about just appending it if it doesn’t exist?
if '.' not in date_string:
date_string = date_string + '.0'
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
I prefer using regex matches instead of try and except. This allows for many fallbacks of acceptable formats.
# full timestamp with milliseconds
match = re.match(r"d{4}-d{2}-d{2}Td{2}:d{2}:d{2}.d+Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%S.%fZ")
# timestamp missing milliseconds
match = re.match(r"d{4}-d{2}-d{2}Td{2}:d{2}:d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%SZ")
# timestamp missing milliseconds & seconds
match = re.match(r"d{4}-d{2}-d{2}Td{2}:d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%MZ")
# unknown timestamp format
return false
Don’t forget to import “re” as well as “datetime” for this method.
datetime(*map(int, re.findall('d+', date_string)))
can parse both '%Y-%m-%d %H:%M:%S.%f'
and '%Y-%m-%d %H:%M:%S'
. It is too permissive if your input is not filtered.
It is quick-and-dirty but sometimes strptime()
is too slow. It can be used if you know that the input has the expected date format.
For my similar problem using jq
I used the following:
|split("Z")[0]|split(".")[0]|strptime("%Y-%m-%dT%H:%M:%S")|mktime
As the solution to sort my list by time properly.
using one regular expression and some list expressions
time_str = "12:34.567"
# time format is [HH:]MM:SS[.FFF]
sum([a*b for a,b in zip(map(lambda x: int(x) if x else 0, re.match(r"(?:(d{2}):)?(d{2}):(d{2})(?:.(d{3}))?", time_str).groups()), [3600, 60, 1, 1/1000])])
# result = 754.567
I’m late to the party but I found if you don’t care about the optional bits this will lop off the .%f
for you.
datestring.split('.')[0]
If you are using Pandas you can also filter the the Series and concatenate it. The index is automatically joined.
import pandas as pd
# Every other row has a different format
df = pd.DataFrame({"datetime_string": ["21-06-08 14:36:09", "21-06-08 14:36:09.50", "21-06-08 14:36:10", "21-06-08 14:36:10.50"]})
df["datetime"] = pd.concat([
pd.to_datetime(df["datetime_string"].iloc[1::2], format="%y-%m-%d %H:%M:%S.%f"),
pd.to_datetime(df["datetime_string"].iloc[::2], format="%y-%m-%d %H:%M:%S"),
])
datetime_string | datetime | |
---|---|---|
0 | 21-06-08 14:36:09 | 2021-06-08 14:36:09 |
1 | 21-06-08 14:36:09.50 | 2021-06-08 14:36:09.500000 |
2 | 21-06-08 14:36:10 | 2021-06-08 14:36:10 |
3 | 21-06-08 14:36:10.50 | 2021-06-08 14:36:10.500000 |