Python 3 RegEx timeout
Question:
I have a regex that might take a long time to execute, despite my best efforts at optimization. I want to be able to interrupt it in the cases where it stalls, and proceed with the rest of the program
Other languages like C# have a Timeout property for the Regex execution and I am wondering why Python 3 seems to not have the same approach.
Internally Python 3 have a sort of maximum time of execution, because after a long time the regex abort and the execution go ahead. Is it true?
I would like to analyze that question on python 3 and to use a platform independent approach (I saw decorator that work only on NIX OSs with Signals…)
Maybe the answer is to manage this problem using a more general approach on how to stop function in Python, like in How to add a timeout to a function in Python or Stopping a function in Python using a timeout.
How can I implement such a timeout?
Answers:
Regarding why the built-in re
module for Python doesn’t have the same timeout approach as C#- Tim Peters has commented on this matter in a now-closed issue:
Introducing some kind of optional timeout is too involved to just drop in without significant discussion and design effort first.
My first take: it wouldn’t really help, because nobody would use it until after it was too late.
However, there is a public PyPI module called regex
which aims to provide complete backwards compatibility with the re
module, while offering more complex functionality (such as timeouts). Here is a snippet directly from their documentation that shows how to use it:
The matching methods and functions support timeouts. The timeout (in seconds) applies to the entire operation:
>>> from time import sleep
>>>
>>> def fast_replace(m):
... return 'X'
...
>>> def slow_replace(m):
... sleep(0.5)
... return 'X'
...
>>> regex.sub(r'[a-z]', fast_replace, 'abcde', timeout=2)
'XXXXX'
>>> regex.sub(r'[a-z]', slow_replace, 'abcde', timeout=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:Python310libsite-packagesregexregex.py", line 278, in sub
return pat.sub(repl, string, count, pos, endpos, concurrent, timeout)
TimeoutError: regex timed out
The timeout functionality in this module is great, because it is wired directly into the main matching loop (see safe_check_cancel
), and is not based on any platform-dependent solution, such as leveraging the signal
module.
I have a regex that might take a long time to execute, despite my best efforts at optimization. I want to be able to interrupt it in the cases where it stalls, and proceed with the rest of the program
Other languages like C# have a Timeout property for the Regex execution and I am wondering why Python 3 seems to not have the same approach.
Internally Python 3 have a sort of maximum time of execution, because after a long time the regex abort and the execution go ahead. Is it true?
I would like to analyze that question on python 3 and to use a platform independent approach (I saw decorator that work only on NIX OSs with Signals…)
Maybe the answer is to manage this problem using a more general approach on how to stop function in Python, like in How to add a timeout to a function in Python or Stopping a function in Python using a timeout.
How can I implement such a timeout?
Regarding why the built-in re
module for Python doesn’t have the same timeout approach as C#- Tim Peters has commented on this matter in a now-closed issue:
Introducing some kind of optional timeout is too involved to just drop in without significant discussion and design effort first.
My first take: it wouldn’t really help, because nobody would use it until after it was too late.
However, there is a public PyPI module called regex
which aims to provide complete backwards compatibility with the re
module, while offering more complex functionality (such as timeouts). Here is a snippet directly from their documentation that shows how to use it:
The matching methods and functions support timeouts. The timeout (in seconds) applies to the entire operation:
>>> from time import sleep
>>>
>>> def fast_replace(m):
... return 'X'
...
>>> def slow_replace(m):
... sleep(0.5)
... return 'X'
...
>>> regex.sub(r'[a-z]', fast_replace, 'abcde', timeout=2)
'XXXXX'
>>> regex.sub(r'[a-z]', slow_replace, 'abcde', timeout=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:Python310libsite-packagesregexregex.py", line 278, in sub
return pat.sub(repl, string, count, pos, endpos, concurrent, timeout)
TimeoutError: regex timed out
The timeout functionality in this module is great, because it is wired directly into the main matching loop (see safe_check_cancel
), and is not based on any platform-dependent solution, such as leveraging the signal
module.