Trouble with lookbehind and lookahead

Question:

I’m having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.

text = """2023-04-05 / 15:53:58 104 Windows Event Logs Cleared 21 low SRVR3 - j.smith 1
          2023-03-20 / 15:17:55 4738 Account Configured with Never-Expiring Password 47 medium DC02SRV - m.rossi 2"""
pattern = '(?<=d{3}|d{4}|-)(.*?)(?=sd{2}s)'
regex = re.findall(pattern,text,re.MULTILINE))'

Current output:

Windows Event Logs Cleared

Expected output:

Windows Event Logs Cleared
Account Configured with Never-Expiring Password

Note:

  1. The date and time are always the same pattern
  2. The message starts just before a 3-digit number or a 4-digit number (in these examples 104 and 4738), but it could also be a –
  3. The message varies in length
  4. The message always ends just before the 2-digit number, which in these examples are 21 for the first and 47 for the second.

If anyone knows of a good, concise, gobbledygook-free tutorial, please lemme know.

Asked By: OverflowStack

||

Answers:

With standard Python re, lookbehinds have to be a fixed length. Since the message can be preceded by a variable-length number, you can’t use a lookbehind for this (the third-party regex library overcomes this restriction).

The workaround is to use a capture group for the message that you want to extract.

The other problem with your regexp is that it doesn’t match the date and time before the message.

pattern = r'^d{4}-d{2}-d{2} / d{2}:d{2}:d{2} (?:-|d{3,4}) (.*?) d{2}'

When you use this, capture group 1 will contain the message.

Answered By: Barmar

You could look a bit further behind, starting at the last colon that is part of the timestamp.

If doing this with the regex module (instead of re), then variable width look behind is possible, but with re you can instead split into multiple alternative fixed-width look-behind assertions in this way:

(?:(?<=:dd d{3} )|(?<=:dd d{4} )|(?<=:dd - ))(.*?)(?=sd{2}s)

If using regex, then you can even use K instead of a look behind assertion:

:dd (?:d{3,4}|-) K(.*?)(?=sd{2}s)
Answered By: trincot
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.