Python 3 RegEx timeout

Question:

I have a regex that might take a long time to execute, despite my best efforts at optimization. I want to be able to interrupt it in the cases where it stalls, and proceed with the rest of the program

Other languages like C# have a Timeout property for the Regex execution and I am wondering why Python 3 seems to not have the same approach.

Internally Python 3 have a sort of maximum time of execution, because after a long time the regex abort and the execution go ahead. Is it true?

I would like to analyze that question on python 3 and to use a platform independent approach (I saw decorator that work only on NIX OSs with Signals…)

Maybe the answer is to manage this problem using a more general approach on how to stop function in Python, like in How to add a timeout to a function in Python or Stopping a function in Python using a timeout.

How can I implement such a timeout?

Asked By: robob

||

Answers:

Regarding why the built-in re module for Python doesn’t have the same timeout approach as C#- Tim Peters has commented on this matter in a now-closed issue:

Introducing some kind of optional timeout is too involved to just drop in without significant discussion and design effort first.

My first take: it wouldn’t really help, because nobody would use it until after it was too late.

However, there is a public PyPI module called regex which aims to provide complete backwards compatibility with the re module, while offering more complex functionality (such as timeouts). Here is a snippet directly from their documentation that shows how to use it:

The matching methods and functions support timeouts. The timeout (in seconds) applies to the entire operation:

>>> from time import sleep
>>>
>>> def fast_replace(m):
...     return 'X'
...
>>> def slow_replace(m):
...     sleep(0.5)
...     return 'X'
...
>>> regex.sub(r'[a-z]', fast_replace, 'abcde', timeout=2)
'XXXXX'
>>> regex.sub(r'[a-z]', slow_replace, 'abcde', timeout=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:Python310libsite-packagesregexregex.py", line 278, in sub
    return pat.sub(repl, string, count, pos, endpos, concurrent, timeout)
TimeoutError: regex timed out

The timeout functionality in this module is great, because it is wired directly into the main matching loop (see safe_check_cancel), and is not based on any platform-dependent solution, such as leveraging the signal module.

Answered By: Xiddoc