Python RegEx – Capture Varying Length of Characters

Question:

I’m trying to capture strings that do not have a standard form. Some come with two words, others three, and some even have a phrase. What I’ve been able to muster at this point is to capture up to two words only. Any help is appreciated.

str1 = 'File quarantined'
str2 = 'Unable to quarantine file'
str3 = 'Action Required - Restart the endpoint to finish cleaning the security threat'
str4 = 'Unable to upload file'
str5 = 'Unable to delete file'

The following is not working as expected since it only captures the first two words.

pattern = 'w+s([^s]+)([^s]+)'
str2 = 'Unable to quarantine file'
res = re.search(pattern,str2)
print(res)

These strings come directly from the server. The RegEx needs to capture all the strings, whether it’s 2 words, 3, or more.
The strings are part of a list of long strings. The section I need is preceded by cs5=. A sample of said list of strings is provided below:

malware = ['Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File quarantined|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=313 rt=2022-12-21 08:44:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=66e9f4d4-df39-488d-8cf8-bdcf5d890598.tmp filePath=C:\\Users\\emil\\Downloads\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File passed|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=314 rt=2022-12-21 08:45:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=rev_shell.exe filePath=C:\\Users\\emil\\Downloads\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File cleaned|TROJ_GEN.R002C0DKG22|3|deviceExternalId=315 rt=2022-12-21 10:20:31 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smit act=File cleaned cn1Label=Pattern cn1=1814500 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=1 fname=aowect.dll filePath=C:\\Users\\emil\\AppData\\Local\\Temp\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to upload file|TSC_GENCLEAN|3|deviceExternalId=316 rt=2022-12-21 13:37:42 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=Unable to clean file cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=Non confermato 184296.crdownload filePath=C:\\Users\\emil\\Downloads\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to delete file|Troj.Win32.TRX.XXPE50FFF063|3|deviceExternalId=317 rt=2022-12-21 13:37:49 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=pumpkin-2.7.3.exe filePath=C:\\Users\\emil\\Downloads\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ']
Asked By: CUI

||

Answers:

Use lookarounds to match the text between the keywords.

import re

pattern = re.compile(r'(?<=cs5=).*?(?=s+cs6Label=)')
malware = ['Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File quarantined|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=313 rt=2022-12-21 08:44:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=66e9f4d4-df39-488d-8cf8-bdcf5d890598.tmp filePath=C:\\Users\\emil\\Downloads\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File passed|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=314 rt=2022-12-21 08:45:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=rev_shell.exe filePath=C:\\Users\\emil\\Downloads\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File cleaned|TROJ_GEN.R002C0DKG22|3|deviceExternalId=315 rt=2022-12-21 10:20:31 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smit act=File cleaned cn1Label=Pattern cn1=1814500 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=1 fname=aowect.dll filePath=C:\\Users\\emil\\AppData\\Local\\Temp\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to upload file|TSC_GENCLEAN|3|deviceExternalId=316 rt=2022-12-21 13:37:42 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=Unable to clean file cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=Non confermato 184296.crdownload filePath=C:\\Users\\emil\\Downloads\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to delete file|Troj.Win32.TRX.XXPE50FFF063|3|deviceExternalId=317 rt=2022-12-21 13:37:49 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=pumpkin-2.7.3.exe filePath=C:\\Users\\emil\\Downloads\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\Notebook\\ ']
results=[]
for s in malware:
    m = pattern.search(s)
    if m:
        results.append(m.group())

print(results)

Output:

['File quarantined', 'File quarantined', 'File cleaned', 'Unable to clean file', 'File cleaned']
Answered By: Barmar
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.