Expression that captures all characters up to a group of characters
Question:
I have several alerts coming from a DC server, which have the following pattern:
alert - name risk score - severity - total
The examples of these alerts would be:
A member was added to a security-enabled local group 47 medium 2
A member was added to a security-enabled universal group 47 medium 1
A security-enabled global group was changed 73 high 2
A security-enabled local group was changed 73 high 2
A user account was locked out 47 medium 31
An attempt was made to reset an accounts password 73 high 14
Member added to security-enabled global group 73 high 2
PowerShell Keylogging Script 73 high 23
PowerShell Suspicious Script with Audio Capture Capabilities 47 medium 23
More Than 3 Failed Login Attempts Within 1 Hour 47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8
I’ve been using the following pattern to capture only the name of the alerts:
^(D*)
Essentially, this filters out all of the digits, but now have I received a few alerts I hadn’t accounted for. These alerts contain digits in them. For example:
More Than 3 Failed Login Attempts Within 1 Hour 47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8
So I need to be able to capture the complete name, otherwise, I’m ending up with:
More than
Over
Over
Failed Logins Not Followed by Success Within
Despite my efforts, I have not been able to capture the desire pattern. This would be the desired output:
A member was added to a security-enabled local group
A member was added to a security-enabled universal group
A security-enabled global group was changed
A security-enabled local group was changed
A user account was locked out
An attempt was made to reset an accounts password
PowerShell Keylogging Script
PowerShell Suspicious Script with Audio Capture Capabilities
More Than 3 Failed Login Attempts Within 1 Hour
Over 100 Connection from 10 Diff. IPs
Over 100 Connections Attempted
Failed Logins Not Followed by Success Within 2 Hours
Thanks for taking the time to help!
Answers:
The following regex should do the trick: .*b(?= d* .* d*$)
The (?=...)
syntax is called a lookahead, and it allows us to specify the text that must follow the specified regex. Here, we’re essentially looking for anything followed by the pattern: space, number, space, anything, space, number, end of line.
Here is an alternative possible re. Note: I am anticipating that alerts
is a list of strings.
The pattern matches any string of characters at the beginning of the string ^(.*)
, followed by s
which matches any whitespace character. (d+)
matches one or more digits then one or more letters (w+)
and one or more digits (d+)
at the end of the string ($)
.
import re
data = """
A member was added to a security-enabled local group 47 medium 2
A member was added to a security-enabled universal group 47 medium 1
A security-enabled global group was changed 73 high 2
A security-enabled local group was changed 73 high 2
A user account was locked out 47 medium 31
An attempt was made to reset an accounts password 73 high 14
Member added to security-enabled global group 73 high 2
PowerShell Keylogging Script 73 high 23
PowerShell Suspicious Script with Audio Capture Capabilities 47 medium 23
More Than 3 Failed Login Attempts Within 1 Hour 47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8
"""
alerts = data.splitlines()
pattern = re.compile(r'^(.*)sd+sw+sd+$')
for alert in alerts:
res = pattern.search(alert)
if res:
print(res.group(1))
A member was added to a security-enabled local group
A member was added to a security-enabled universal group
A security-enabled global group was changed
A security-enabled local group was changed
A user account was locked out
An attempt was made to reset an accounts password
Member added to security-enabled global group
PowerShell Keylogging Script
PowerShell Suspicious Script with Audio Capture Capabilities
More Than 3 Failed Login Attempts Within 1 Hour
Over 100 Connection from 10 Diff. IPs
Over 100 Connections Attempted
Failed Logins Not Followed by Success Within 2 Hours
I have several alerts coming from a DC server, which have the following pattern:
alert - name risk score - severity - total
The examples of these alerts would be:
A member was added to a security-enabled local group 47 medium 2
A member was added to a security-enabled universal group 47 medium 1
A security-enabled global group was changed 73 high 2
A security-enabled local group was changed 73 high 2
A user account was locked out 47 medium 31
An attempt was made to reset an accounts password 73 high 14
Member added to security-enabled global group 73 high 2
PowerShell Keylogging Script 73 high 23
PowerShell Suspicious Script with Audio Capture Capabilities 47 medium 23
More Than 3 Failed Login Attempts Within 1 Hour 47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8
I’ve been using the following pattern to capture only the name of the alerts:
^(D*)
Essentially, this filters out all of the digits, but now have I received a few alerts I hadn’t accounted for. These alerts contain digits in them. For example:
More Than 3 Failed Login Attempts Within 1 Hour 47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8
So I need to be able to capture the complete name, otherwise, I’m ending up with:
More than
Over
Over
Failed Logins Not Followed by Success Within
Despite my efforts, I have not been able to capture the desire pattern. This would be the desired output:
A member was added to a security-enabled local group
A member was added to a security-enabled universal group
A security-enabled global group was changed
A security-enabled local group was changed
A user account was locked out
An attempt was made to reset an accounts password
PowerShell Keylogging Script
PowerShell Suspicious Script with Audio Capture Capabilities
More Than 3 Failed Login Attempts Within 1 Hour
Over 100 Connection from 10 Diff. IPs
Over 100 Connections Attempted
Failed Logins Not Followed by Success Within 2 Hours
Thanks for taking the time to help!
The following regex should do the trick: .*b(?= d* .* d*$)
The (?=...)
syntax is called a lookahead, and it allows us to specify the text that must follow the specified regex. Here, we’re essentially looking for anything followed by the pattern: space, number, space, anything, space, number, end of line.
Here is an alternative possible re. Note: I am anticipating that alerts
is a list of strings.
The pattern matches any string of characters at the beginning of the string ^(.*)
, followed by s
which matches any whitespace character. (d+)
matches one or more digits then one or more letters (w+)
and one or more digits (d+)
at the end of the string ($)
.
import re
data = """
A member was added to a security-enabled local group 47 medium 2
A member was added to a security-enabled universal group 47 medium 1
A security-enabled global group was changed 73 high 2
A security-enabled local group was changed 73 high 2
A user account was locked out 47 medium 31
An attempt was made to reset an accounts password 73 high 14
Member added to security-enabled global group 73 high 2
PowerShell Keylogging Script 73 high 23
PowerShell Suspicious Script with Audio Capture Capabilities 47 medium 23
More Than 3 Failed Login Attempts Within 1 Hour 47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8
"""
alerts = data.splitlines()
pattern = re.compile(r'^(.*)sd+sw+sd+$')
for alert in alerts:
res = pattern.search(alert)
if res:
print(res.group(1))
A member was added to a security-enabled local group
A member was added to a security-enabled universal group
A security-enabled global group was changed
A security-enabled local group was changed
A user account was locked out
An attempt was made to reset an accounts password
Member added to security-enabled global group
PowerShell Keylogging Script
PowerShell Suspicious Script with Audio Capture Capabilities
More Than 3 Failed Login Attempts Within 1 Hour
Over 100 Connection from 10 Diff. IPs
Over 100 Connections Attempted
Failed Logins Not Followed by Success Within 2 Hours