Trouble with lookbehind and lookahead
Question:
I’m having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.
text = """2023-04-05 / 15:53:58 104 Windows Event Logs Cleared 21 low SRVR3 - j.smith 1
2023-03-20 / 15:17:55 4738 Account Configured with Never-Expiring Password 47 medium DC02SRV - m.rossi 2"""
pattern = '(?<=d{3}|d{4}|-)(.*?)(?=sd{2}s)'
regex = re.findall(pattern,text,re.MULTILINE))'
Current output:
Windows Event Logs Cleared
Expected output:
Windows Event Logs Cleared
Account Configured with Never-Expiring Password
Note:
- The date and time are always the same pattern
- The message starts just before a 3-digit number or a 4-digit number (in these examples 104 and 4738), but it could also be a –
- The message varies in length
- The message always ends just before the 2-digit number, which in these examples are 21 for the first and 47 for the second.
If anyone knows of a good, concise, gobbledygook-free tutorial, please lemme know.
Answers:
With standard Python re
, lookbehinds have to be a fixed length. Since the message can be preceded by a variable-length number, you can’t use a lookbehind for this (the third-party regex
library overcomes this restriction).
The workaround is to use a capture group for the message that you want to extract.
The other problem with your regexp is that it doesn’t match the date and time before the message.
pattern = r'^d{4}-d{2}-d{2} / d{2}:d{2}:d{2} (?:-|d{3,4}) (.*?) d{2}'
When you use this, capture group 1 will contain the message.
You could look a bit further behind, starting at the last colon that is part of the timestamp.
If doing this with the regex
module (instead of re
), then variable width look behind is possible, but with re
you can instead split into multiple alternative fixed-width look-behind assertions in this way:
(?:(?<=:dd d{3} )|(?<=:dd d{4} )|(?<=:dd - ))(.*?)(?=sd{2}s)
If using regex
, then you can even use K
instead of a look behind assertion:
:dd (?:d{3,4}|-) K(.*?)(?=sd{2}s)
I’m having a hard time trying to get this simple RegEx to work. I need to capture the message "Windows Event Logs Cleared" or any other message that might be in that position.
text = """2023-04-05 / 15:53:58 104 Windows Event Logs Cleared 21 low SRVR3 - j.smith 1
2023-03-20 / 15:17:55 4738 Account Configured with Never-Expiring Password 47 medium DC02SRV - m.rossi 2"""
pattern = '(?<=d{3}|d{4}|-)(.*?)(?=sd{2}s)'
regex = re.findall(pattern,text,re.MULTILINE))'
Current output:
Windows Event Logs Cleared
Expected output:
Windows Event Logs Cleared
Account Configured with Never-Expiring Password
Note:
- The date and time are always the same pattern
- The message starts just before a 3-digit number or a 4-digit number (in these examples 104 and 4738), but it could also be a –
- The message varies in length
- The message always ends just before the 2-digit number, which in these examples are 21 for the first and 47 for the second.
If anyone knows of a good, concise, gobbledygook-free tutorial, please lemme know.
With standard Python re
, lookbehinds have to be a fixed length. Since the message can be preceded by a variable-length number, you can’t use a lookbehind for this (the third-party regex
library overcomes this restriction).
The workaround is to use a capture group for the message that you want to extract.
The other problem with your regexp is that it doesn’t match the date and time before the message.
pattern = r'^d{4}-d{2}-d{2} / d{2}:d{2}:d{2} (?:-|d{3,4}) (.*?) d{2}'
When you use this, capture group 1 will contain the message.
You could look a bit further behind, starting at the last colon that is part of the timestamp.
If doing this with the regex
module (instead of re
), then variable width look behind is possible, but with re
you can instead split into multiple alternative fixed-width look-behind assertions in this way:
(?:(?<=:dd d{3} )|(?<=:dd d{4} )|(?<=:dd - ))(.*?)(?=sd{2}s)
If using regex
, then you can even use K
instead of a look behind assertion:
:dd (?:d{3,4}|-) K(.*?)(?=sd{2}s)