Question about a multi line regex in python language

Question:

I want to perform the selection of a group of lines in a text file to get all jobs related to an ipref
The test file is like this :
job numbers : (1,2,3), ip ref : (10,12,10)

text file :
1
… (several lines of text)
xxx 10
2
… (several lines of text)
xxx 12
3
… (several lines of text)
xxx 10

i want to select job numbers for IPref=10.

Code :

#!/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()


#pattern='n?d(?!(?:n?xxx d{2}n)*)xxx 10'
pattern='n?d.*?xxx 10'

result= re.findall(pattern,texte, re.DOTALL)

i=1
for match in result:
    print("nmatch:",i)
    i=i+1
    print(match)

Result :

match: 1
1
a
b
xxx 10

match: 2

1
a
b
xxx 12
1
a
b
xxx 10

i have tried to replace .* by a a negative lookahead assertion to only select if no expr like "n?xxx d{2}n" is before "xxx 10" :

pattern='n?d(?!(?:n?xxx d{2}n)*)xxx 10'

but it is not working …

Asked By: Frederic Faure

||

Answers:

You can write the pattern in this way, repeating the newline and asserting not xxx followed by 1 or more digits:

^d(?:n(?!xxx d+$).*)*nxxx 10$

The pattern matches:

  • ^ Start of string
  • d Match a single digit (or d+ for 1 or more)
  • (?: Non capture group
    • n Match a newline
    • (?!xxx d+$) Negative lookahead to assert that the string is not xxx followed by 1+ digits
    • .* If the assertion is true, match the whole line
  • )* Close the group and optionally repeat it
  • nxxx 10$ Match a newline, xxx and 10

Regex demo

Answered By: The fourth bird

Good day to you 🙂 and Thank you very much for your quick response!!
i give you below the result
Note : i have modified re.DOTALL by re.DOTALL|re.MULTILINE (because the result is none without that… Sorry for the previous presentation … it wat not very clear)

Text file :

1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10

Code With your pattern :

#!/usr/bin/python

import re
import sys

fic=open('test2.xml','r')
texte=fic.read()
fic.close()
print(texte)

#pattern='</?(?!(?:span|br|b)(?: [^>]*)?>)[^>/]*>'
#pattern='n?d(?!(?:n?xxx d{2}n?)*?)xxx 10'
#pattern='n?d.*?xxx 10'
pattern='^d(?:n(?!xxx d+$).*)*nxxx 10$'

result= re.findall(pattern,texte, re.DOTALL|re.MULTILINE)

i=1
for match in result:
    print("nmatch:",i)
    i=i+1
    print(match)

Result :

match: 1
1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10 

but i try to obtain :

match: 1
1
a
b
xxx 10

match 2 : 
1
a
b
xxx 10
Answered By: Frederic Faure

Thank you very much, (you saved my day !!)
as you say :

pattern='^d(?:n(?!xxx d+$).*)*nxxx 10$'
result= re.findall(pattern,texte, re.MULTILINE)

result : OK, the line group (1..xxx 12) is ignored,
NOTE : i can adapt it to a case where line 1 is a line giving job information and "xxx 12" is a line giving printer IP information.

match: 1
1
a
b
xxx 10

match: 2
1
a
b
xxx 10
Answered By: Frederic Faure

file :

job_number job_id
1 10202
bla bla
bla bla bla
xxx 100.10.10.100
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

bash script with embedded python script :

#!/bin/bash

# function , $1 : ip of a printer
get_jobs_ip ()
{
cat <<EOF | python
import re

fic=open('test3.xml','r')
texte=fic.read()
fic.close()

"""
The pattern matches example with ip="100.10.10.100" :
thank you to Fourth bird for the pattern !!!
#pattern='^ds+d+(?:n(?!xxx d+.d+.d+.d+$).*)*nxxx 100.10.10.100$'

^ Start of string
d Match a single digit (or d+ for 1 or more)
(?: Non capture group
n Match a newline
(?!xxx d+.d+.d+.d+$) Negative lookahead to assert that the string is not xxx  followed by 1+ digits
.* If the assertion is true, match the whole line
)* Close the group and optionally repeat it
nxxx 100.10.10.100$ Match a newline, xxx  and 10
"""

ip="$1"
pattern_template='^ds+d+(?:n(?!xxx d+.d+.d+.d+$).*)*nxxx @ip@$'
pattern=pattern_template.replace('@ip@',ip)

result= re.findall(pattern,texte, re.MULTILINE)

i=1
for match in result:
    print("nmatch:",i)
    i=i+1
    print(match)
EOF
}

get_jobs_ip "100.10.10.100"
get_jobs_ip "100.10.10.102"

result :

match: 1
1 10202
bla bla
bla bla bla
xxx 100.10.10.100

match: 2
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100

match: 1
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102
Answered By: Frederic Faure
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.