Regex find until word of end of block
Question:
I’m analyzing router logs and found a solution to find blocks of text, but now I’m facing a problem that the end of the text block might not be present. I’ll explain.
This is a sample of a stripped log.
POBL026# show run vpn 0
interface ge0/4
no shutdown
!
interface ge0/4.1
description "TLOC-Extension Custom1"
ip address 10.31.xxx.1/30
nat
respond-to-ping
log-translations
!
tracker lbo2
tunnel-interface
encapsulation ipsec preference 100 weight 33
color custom1 restrict
no allow-service bgp
allow-service dhcp
allow-service dns
allow-service icmp
no allow-service sshd
no allow-service Netconf
no allow-service ntp
no allow-service ospf
no allow-service stun
allow-service https
!
mtu 1496
no shutdown
!
interface ge0/4.2
description "TLOC-Extension biz-internet "
ip address 10.31.xxx.5/30
mtu 1496
tloc-extension ge0/0
no shutdown
!
ip route 0.0.0.0/0 10.31.xxx.2
ip route 0.0.0.0/0 84.198.zzz.217
!
POBL026# show run vpn 1
So with this code I’m able to isolate each vpn block
regex_result = re.search("(?s)(shows*runs*vpns*" + str(vpn_nbr) + "s*n)(.*?)(?=" + routername + ")", filecontent)
Then, in each block I searched for the interfaces with this code
result = re.findall("(?s)(?<=interface )(.*?)(?=!)"", vpn_block)
But this is not OK as I’ve noticed that my interface-block itself can hold this sens before the end. So, I could search until the next "!ninterface", this works fine, but I’m not getting my last block, as there the end is the routername itself.
So I’m looking for a kind of regex that says, everything after the sequence "interface" until "!ninterface" OR "end of the block".
Can this be done?
Answers:
You can use the $
character to specify the end of the string.
result = re.findall(r"(?s)(?<=interface )(.*?)(?=!ns*interface|$)", vpn_block)
The:
(?=!ns*interface|$)
is a positive lookahead that matches until the next interface
or the end of block (|$
).
Note that you cannot use $
inside a character class []
. Inside [$]
, $
loses its special meaning and is a literal character to match.
I’m analyzing router logs and found a solution to find blocks of text, but now I’m facing a problem that the end of the text block might not be present. I’ll explain.
This is a sample of a stripped log.
POBL026# show run vpn 0
interface ge0/4
no shutdown
!
interface ge0/4.1
description "TLOC-Extension Custom1"
ip address 10.31.xxx.1/30
nat
respond-to-ping
log-translations
!
tracker lbo2
tunnel-interface
encapsulation ipsec preference 100 weight 33
color custom1 restrict
no allow-service bgp
allow-service dhcp
allow-service dns
allow-service icmp
no allow-service sshd
no allow-service Netconf
no allow-service ntp
no allow-service ospf
no allow-service stun
allow-service https
!
mtu 1496
no shutdown
!
interface ge0/4.2
description "TLOC-Extension biz-internet "
ip address 10.31.xxx.5/30
mtu 1496
tloc-extension ge0/0
no shutdown
!
ip route 0.0.0.0/0 10.31.xxx.2
ip route 0.0.0.0/0 84.198.zzz.217
!
POBL026# show run vpn 1
So with this code I’m able to isolate each vpn block
regex_result = re.search("(?s)(shows*runs*vpns*" + str(vpn_nbr) + "s*n)(.*?)(?=" + routername + ")", filecontent)
Then, in each block I searched for the interfaces with this code
result = re.findall("(?s)(?<=interface )(.*?)(?=!)"", vpn_block)
But this is not OK as I’ve noticed that my interface-block itself can hold this sens before the end. So, I could search until the next "!ninterface", this works fine, but I’m not getting my last block, as there the end is the routername itself.
So I’m looking for a kind of regex that says, everything after the sequence "interface" until "!ninterface" OR "end of the block".
Can this be done?
You can use the $
character to specify the end of the string.
result = re.findall(r"(?s)(?<=interface )(.*?)(?=!ns*interface|$)", vpn_block)
The:
(?=!ns*interface|$)
is a positive lookahead that matches until the next interface
or the end of block (|$
).
Note that you cannot use $
inside a character class []
. Inside [$]
, $
loses its special meaning and is a literal character to match.