Python regex to parse the datetime.datetime object from string
Question:
I have the following string:
"{'foo': datetime.datetime(2022, 5, 23, 0, 0, tzinfo=tzlocal()), 'bar': 'some data', 'foobar': datetime.datetime(2022, 8, 3, 13, 57, 41, tzinfo=<UTC>), 'barlist': ['hello', 'world']}"
I want to be able to match all the datetime.datetime(...)
strings within this string and replace it with the numbers in a list form only. So this is the expected result:
"{'foo': [2022, 5, 23, 0, 0], 'bar': 'some data', 'foobar': [2022, 8, 3, 13, 57, 41], 'barlist': ['hello', 'world']}"
I have something like this:
DATETIME_PATTERN = r"datetime.datetime(((d+)(,s*d+)*), tzinfo=.*)"
modified_input_str = re.sub(DATETIME_PATTERN, r"[1]", input_str)
but it replaces a big chunk of data inbetween the matches. How can I modify the regex to accomplish what I want?
Conclusion:
I made a modification of the current best answer so it fits my particular usecase more:
DATETIME_PATTERN = r"datetime.datetime((d+(?:,s*d+)*), tzinfo=(?:[^sd])*)"
# The difference is that the string at the end of 'tzinfo=' can be anything but whitespace or numbers.
Answers:
You can use
datetime.datetime((d+(?:,s*d+)*), tzinfo=(?:()|[^()])*)
Details:
datetime.datetime(
– a datetime.datetime(
string
(d+(?:,s*d+)*)
– Group 1: one or more digits and then zero or more repetitions of a comma + zero or more whitespaces and then one or more digits
, tzinfo=
– a literal string
(?:()|[^()])*
– zero or more repetitions of a ()
string or any char other than (
and )
)
– a )
char.
See the regex demo.
I have the following string:
"{'foo': datetime.datetime(2022, 5, 23, 0, 0, tzinfo=tzlocal()), 'bar': 'some data', 'foobar': datetime.datetime(2022, 8, 3, 13, 57, 41, tzinfo=<UTC>), 'barlist': ['hello', 'world']}"
I want to be able to match all the datetime.datetime(...)
strings within this string and replace it with the numbers in a list form only. So this is the expected result:
"{'foo': [2022, 5, 23, 0, 0], 'bar': 'some data', 'foobar': [2022, 8, 3, 13, 57, 41], 'barlist': ['hello', 'world']}"
I have something like this:
DATETIME_PATTERN = r"datetime.datetime(((d+)(,s*d+)*), tzinfo=.*)"
modified_input_str = re.sub(DATETIME_PATTERN, r"[1]", input_str)
but it replaces a big chunk of data inbetween the matches. How can I modify the regex to accomplish what I want?
Conclusion:
I made a modification of the current best answer so it fits my particular usecase more:
DATETIME_PATTERN = r"datetime.datetime((d+(?:,s*d+)*), tzinfo=(?:[^sd])*)"
# The difference is that the string at the end of 'tzinfo=' can be anything but whitespace or numbers.
You can use
datetime.datetime((d+(?:,s*d+)*), tzinfo=(?:()|[^()])*)
Details:
datetime.datetime(
– adatetime.datetime(
string(d+(?:,s*d+)*)
– Group 1: one or more digits and then zero or more repetitions of a comma + zero or more whitespaces and then one or more digits, tzinfo=
– a literal string(?:()|[^()])*
– zero or more repetitions of a()
string or any char other than(
and)
)
– a)
char.
See the regex demo.