How to write a regular expression correctly in python
Question:
i have the following piece of text where i need to find the threat id from the log
C:\Users\Administrator\Downloads\CallbackHell.exe}rnThreatID : 2147725414rnThreatStatusErrorCode : 0rnThreatStatusID : 3rnPSComputerName : rnrnActionSuccess : TruernAdditionalActionsBitMask : 0rnAMProductVersion : 4.18.2211.5rnCleaningActionID : 2rnCurrentThreatExecutionStatusID : 1rnDetectionID : {F9B830AE-D82E-4248-9D9D-723F2FB3AF95}rnDetectionSourceTypeID : 3rnDomainUser : WIN-LIVFRVQFMKO\AdministratorrnInitialDetectionTime : 1/9/2023 6:43:30 PMrnLastThreatStatusChangeTime : 1/9/2023 6:43:59 PMrnProcessName : C:\Windows\explorer.exernRemediationTime : 1/9/2023 6:43:59 PMrnResources : {file:_C:\Users\Administrator\Desktop\CallbackHell.exe:3rnPSComputerName : rnrnActionSuccess : TruernAdditionalActionsBitMask : 0rnAMProductVersion : 4.18.2211.5rnCleaningActionID : 2rnCurrentThreatExecutionStatusID : 1rnDetectionID : {F9B830AE-D82E-4248-9D9D-723F2FB3AF95}rnDetectionSourceTypeID : 3rnDomainUser : WIN-LIVFRVQFMKO\AdministratorrnInitialDetectionTime : 1/9/2023 6:43:30 PMrnLastThreatStatusChangeTime : 1/9/2023 6:43:59 PMrnProcessName : C:\Windows\explorer.exernRemediationTime : 1/9/2023 6:43:59 PMrnResources : {file:_C:\Users\Administrator\Desktop\CallbackHell.exe}rnThreatID : 2147725414rnThreatStatusErrorCode : 0rnThreatStatusID : 3,
I write the expression as follows
ThreatStatusID : (.*)\r\nPSComputerName
but for some reason it doesn’t work
I see an error here
what’s my mistake?
my code is
try:
re_filename_pattern = re.compile(r'{file:_(.*)}')
mo = re_filename_pattern.search(str(output))
re_filename_pattern2 = re.compile(r'ThreatStatusID : (.*)\r\nPS')
mo2 = re_filename_pattern2.search(str(output))
if mo2 is not None and mo is not None:
log += (mo.group(1)) + ":" + (mo2.group(1)) + ", "
except:
print('cant get filename')
Answers:
You’ve probably overlooked the fact that .*
is greedy: *
will match all characters until it can’t match no more. As a result, it only stops matches at the last rnPS
, not the first rnPS
(as .*
also matches all the other rnPS
s).
You can try and use .*?
to use the non-greedy counterpart of *
. See also the documentation (search for ?
).
E.g.
re_filename_pattern2 = re.compile(r'ThreatStatusIDs+: (.*?)\r\nPS')
(s+
sprinkled in, because all those spaces make the pattern hard (too long) to read.)
i have the following piece of text where i need to find the threat id from the log
C:\Users\Administrator\Downloads\CallbackHell.exe}rnThreatID : 2147725414rnThreatStatusErrorCode : 0rnThreatStatusID : 3rnPSComputerName : rnrnActionSuccess : TruernAdditionalActionsBitMask : 0rnAMProductVersion : 4.18.2211.5rnCleaningActionID : 2rnCurrentThreatExecutionStatusID : 1rnDetectionID : {F9B830AE-D82E-4248-9D9D-723F2FB3AF95}rnDetectionSourceTypeID : 3rnDomainUser : WIN-LIVFRVQFMKO\AdministratorrnInitialDetectionTime : 1/9/2023 6:43:30 PMrnLastThreatStatusChangeTime : 1/9/2023 6:43:59 PMrnProcessName : C:\Windows\explorer.exernRemediationTime : 1/9/2023 6:43:59 PMrnResources : {file:_C:\Users\Administrator\Desktop\CallbackHell.exe:3rnPSComputerName : rnrnActionSuccess : TruernAdditionalActionsBitMask : 0rnAMProductVersion : 4.18.2211.5rnCleaningActionID : 2rnCurrentThreatExecutionStatusID : 1rnDetectionID : {F9B830AE-D82E-4248-9D9D-723F2FB3AF95}rnDetectionSourceTypeID : 3rnDomainUser : WIN-LIVFRVQFMKO\AdministratorrnInitialDetectionTime : 1/9/2023 6:43:30 PMrnLastThreatStatusChangeTime : 1/9/2023 6:43:59 PMrnProcessName : C:\Windows\explorer.exernRemediationTime : 1/9/2023 6:43:59 PMrnResources : {file:_C:\Users\Administrator\Desktop\CallbackHell.exe}rnThreatID : 2147725414rnThreatStatusErrorCode : 0rnThreatStatusID : 3,
I write the expression as follows
ThreatStatusID : (.*)\r\nPSComputerName
but for some reason it doesn’t work
I see an error here
what’s my mistake?
my code is
try:
re_filename_pattern = re.compile(r'{file:_(.*)}')
mo = re_filename_pattern.search(str(output))
re_filename_pattern2 = re.compile(r'ThreatStatusID : (.*)\r\nPS')
mo2 = re_filename_pattern2.search(str(output))
if mo2 is not None and mo is not None:
log += (mo.group(1)) + ":" + (mo2.group(1)) + ", "
except:
print('cant get filename')
You’ve probably overlooked the fact that .*
is greedy: *
will match all characters until it can’t match no more. As a result, it only stops matches at the last rnPS
, not the first rnPS
(as .*
also matches all the other rnPS
s).
You can try and use .*?
to use the non-greedy counterpart of *
. See also the documentation (search for ?
).
E.g.
re_filename_pattern2 = re.compile(r'ThreatStatusIDs+: (.*?)\r\nPS')
(s+
sprinkled in, because all those spaces make the pattern hard (too long) to read.)