Regular Expression search of a PCAP file

Question:

We have been given a PCAP file and my job is to find:

The user of the host PC tried to access some suspected website whose domain name ends with .top. Use Python (with the help of Regular Expression) to find the susceptible website.

By opening the PCAP file on notepad and doing a Ctrl + F search through it I have already found the correct answer to be: http://p27dokhpz2n7nvgr.1jw2lx.top

However this is obviously not the purpose of the assignment as I have to use Python and Regular Expression to return that website

The code I have tried so far is:

import re

pcapfile = open('CyberSecurity2019.pcap', 'rb')

mypattern = re.compile(rb"S+.topb")

x = mypattern.findall(pcapfile.read())

print("x = ", x)

However this is what it returns:

x =  [b"c('_SS','R','20',0,'/');f=_w.top", b'g_triggerElems!==e&&(g_triggerElems[i].isHotSpotDisabled=!1);v=i+1,r=s[i],a=_ge("sc_hst"+v),a.style.left=r.locx+"%",a.style.top', b't=u.getBoundingClientRect(),o=t.width?Math.abs(t.right-t.left):t.width,a=s(u,"paddingLeft");o=o-(a?parseInt(a):0);v=t.height?Math.abs(t.bottom-t.top', b'n=document.getElementById(keyMap.Notification),t;n&&(n.parentNode.removeChild(n),t=document.getElementById("id_h"),t&&(t.style.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top', b'http://p27dokhpz2n7nvgr.1jw2lx.top', b'p27dokhpz2n7nvgr.1jw2lx.top']

and that goes on and on for a while.

Any help in setting me on the right track would be appreciated.

Thank you

Asked By: SNIPERATI0N

||

Answers:

Since all links you want to extract start with http or https you may use

rb'https?://S+?.topb'

See the regex demo. Note that r string literal prefix defines a raw string literal (so as all backslashes were treated as literal backslashes and not as part of string escape sequences) and b is necessary here because PCAP files are binary, hence the pattern should also be a binary string.

Details

  • https?://http:// or https://
  • S+? – 1 or more non-whitespace characters
  • .top – a .top substring (note the escaped dot, an unescaped dot matches any char other than a line break char in Python re)
  • b – a word boundary (note that r prefix allows the use of a single backslash to define a regex escape, if you do not use r prefix, you would need to write it as \b)
Answered By: Wiktor Stribiżew

I hope this helps! For more information and examples, you may want to check out this GitHub repository that includes various scripts and utilities for analyzing PCAP files using regular expressions: https://github.com/ftaxats/Pcap-Analyser/

Answered By: ftaxats