RegEx Look Ahead/Look Behind when the look behind pattern is different

Question:

First off, my apologies for posting this ugly, long sample, but it’s all I could muster. I’m trying to get the IPs for both the source of the malware and the host. My pattern works well with the host, but it breaks when I try to return the source IP because the pattern captured in the look behind portion of the log changes. So I’m stuck.

logs = [
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12446 devicePayloadId=8F003A0D28D9 rt=2023-05-03 00:09:25 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=MIM0002012 TMCMLogDetectedHost=MIM0002012 src=172.16.4.90 TMCMLogDetectedIP=172.16.4.90 cs3Label=SLF_DomainName cs3=Acme act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=173.233.137.60 deviceProcessName=C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe dvchost=somedomain.manage.trendmicro.com",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12447 devicePayloadId=8F003A0D28D9 rt=2023-05-03 08:02:58 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=LENOVOM910Q TMCMLogDetectedHost=LENOVOM910Q src=10.10.110.69 TMCMLogDetectedIP=10.10.110.69 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=192.243.61.227 deviceProcessName=C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12448 devicePayloadId=8F003A0D28D9 rt=2023-05-03 08:02:58 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=LENOVOM910Q TMCMLogDetectedHost=LENOVOM910Q src=10.10.110.69 TMCMLogDetectedIP=10.10.110.69 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=173.233.137.36 deviceProcessName=C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12449 devicePayloadId=8F003A0D28D9 rt=2023-05-03 08:02:59 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=LENOVOM910Q TMCMLogDetectedHost=LENOVOM910Q src=10.10.110.69 TMCMLogDetectedIP=10.10.110.69 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=192.243.59.13 deviceProcessName=C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12450 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:15 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01 src=10.10.220.172 TMCMLogDetectedIP=10.10.220.172 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=4 cs5Label=CnCDestinationURL cs5=somewebsite.com deviceProcessName=C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12451 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:16 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01 src=10.10.220.172 TMCMLogDetectedIP=10.10.220.172 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=4 cs5Label=CnCDestinationURL cs5=somewebsite.com deviceProcessName=C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12452 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:19 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01 src=10.10.220.172 TMCMLogDetectedIP=10.10.220.172 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=4 cs5Label=CnCDestinationURL cs5=somewebsite.com deviceProcessName=C:\\Windows\\System32\\svchost.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12453 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:19 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01"
]

The host Ips are found in this section:

And the code is:

endpoint_ip_list = [re.sub('dst=','',re.search('(?<=src=).*?(?=s+TMCMLogDetectedIP=)',log).group()) for log in logs]

Output:

['172.16.4.90', '10.10.110.69', '10.10.110.69', '10.10.110.69', '10.10.220.172', '10.10.220.172', '10.10.220.172', '10.10.220.172']

The second part is the source IP (the source of the possible attack), which is found in this section:

Sometimes the logs show a domain instead of an IP address depending on the policy. So, when I run the regex for the section highlighted in green,it obviously breaks.

callback_ip_list = [re.sub('dst=','',re.search('(?<=dst=).*?(?=s+deviceProcessName=)',log).group()) for log in logs]

Output:

callback_ip_list = [re.sub('dst=','',re.search('(?<=dst=).*?(?=s+deviceProcessName=)',log).group()) for log in logs]
AttributeError: 'NoneType' object has no attribute 'group'

If you know of a way to capture both the IP and the domain in the same expression, it would be perfect, but I’m content with any fix for this tbh. Thanks for your help!

Asked By: OverflowStack

||

Answers:

Use alternatives to match either dst= or cs5= before deviceProcessName=.

(?:(?<=dst=).*?|(?<=cs5=).*?)(?=s+deviceProcessName=)
Answered By: Barmar

Try (Regex101):

import re

logs = [
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12446 devicePayloadId=8F003A0D28D9 rt=2023-05-03 00:09:25 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=MIM0002012 TMCMLogDetectedHost=MIM0002012 src=172.16.4.90 TMCMLogDetectedIP=172.16.4.90 cs3Label=SLF_DomainName cs3=Acme act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=173.233.137.60 deviceProcessName=C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe dvchost=somedomain.manage.trendmicro.com",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12447 devicePayloadId=8F003A0D28D9 rt=2023-05-03 08:02:58 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=LENOVOM910Q TMCMLogDetectedHost=LENOVOM910Q src=10.10.110.69 TMCMLogDetectedIP=10.10.110.69 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=192.243.61.227 deviceProcessName=C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12448 devicePayloadId=8F003A0D28D9 rt=2023-05-03 08:02:58 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=LENOVOM910Q TMCMLogDetectedHost=LENOVOM910Q src=10.10.110.69 TMCMLogDetectedIP=10.10.110.69 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=173.233.137.36 deviceProcessName=C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12449 devicePayloadId=8F003A0D28D9 rt=2023-05-03 08:02:59 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=LENOVOM910Q TMCMLogDetectedHost=LENOVOM910Q src=10.10.110.69 TMCMLogDetectedIP=10.10.110.69 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=1 dst=192.243.59.13 deviceProcessName=C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12450 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:15 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01 src=10.10.220.172 TMCMLogDetectedIP=10.10.220.172 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=4 cs5Label=CnCDestinationURL cs5=somewebsite.com deviceProcessName=C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12451 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:16 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01 src=10.10.220.172 TMCMLogDetectedIP=10.10.220.172 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=4 cs5Label=CnCDestinationURL cs5=somewebsite.com deviceProcessName=C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12452 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:19 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01 src=10.10.220.172 TMCMLogDetectedIP=10.10.220.172 cs3Label=SLF_DomainName cs3=Acme_Headquarter act=Block cn1Label=SLF_CCCA_RiskLevel cn1=3 cn2Label=SLF_CCCA_DetectionSource cn2=2 cn3Label=SLF_CCCA_DestinationFormat cn3=4 cs5Label=CnCDestinationURL cs5=somewebsite.com deviceProcessName=C:\\Windows\\System32\\svchost.exe dvchost=somedomain.manage.trendmicro.com ",
    "May 03 2023 19:30:30 abcde.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|CnC:Block|CnC Callback|3|deviceExternalId=12453 devicePayloadId=8F003A0D28D9 rt=2023-05-03 09:42:19 cat=1756 deviceFacility=Apex One cs2Label=EI_ProductVersion cs2=14.0 shost=DELLTELEC01 TMCMLogDetectedHost=DELLTELEC01"
]

for line in logs:
    m = re.search(r'src=(S+).*(?:dst=|cs5Label=)(S+)', line)
    if m:
        print(f'src={m.group(1)} dst={m.group(2)}')
    else:
        print('Not found.')

Prints:

src=172.16.4.90 dst=173.233.137.60
src=10.10.110.69 dst=192.243.61.227
src=10.10.110.69 dst=173.233.137.36
src=10.10.110.69 dst=192.243.59.13
src=10.10.220.172 dst=CnCDestinationURL
src=10.10.220.172 dst=CnCDestinationURL
src=10.10.220.172 dst=CnCDestinationURL
Not found.
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.