Regex pattern returns null for variable as pattern

Question:

I am reading regex pattern from a .txt file and passing it as a variable and using it to search in a very big text file. However, variable passing in regex search didnt work. My code snippet is

with open(r"C:Desktoplist_pattern.txt", "r") as file1:
  for pattern in file1:
      with open(r'C:Desktoplog.txt',"r") as my_file:
          for lines in my_file:
              k=re.search('{}'.format(pattern), lines)   # I even tried re.search(pattern, lines)            
              if k!=None:
                  print("k is",k)

For example, the first lne in list_pattern.txt is "Battery Low" and it gives 0 match in log.txt. However, if i replace the code line with k=re.search('Battery Low', lines), it gives 12 match. Any idea what may be wrong? I am using python 3.10.

Asked By: mrin9san

||

Answers:

it worked fine, I simulated with other files.
the result was like this:

    k is <re.Match object; span=(91, 93), match='in'>
    k is <re.Match object; span=(3, 5), match='in'>
    k is <re.Match object; span=(22, 24), match='in'>
    k is <re.Match object; span=(4, 6), match='in'>
    k is <re.Match object; span=(20, 22), match='in'>
    k is <re.Match object; span=(40, 42), match='in'>
    k is <re.Match object; span=(25, 27), match='in'>
    k is <re.Match object; span=(30, 32), match='in'>
    k is <re.Match object; span=(32, 34), match='in'>
    k is <re.Match object; span=(50, 52), match='in'>
    k is <re.Match object; span=(10, 12), match='in'>
    k is <re.Match object; span=(165, 167), match='in'>
    k is <re.Match object; span=(34, 36), match='in'>
    k is <re.Match object; span=(26, 28), match='in'>
    k is <re.Match object; span=(35, 37), match='in'>
    k is <re.Match object; span=(14, 16), match='in'>
    k is <re.Match object; span=(46, 48), match='in'>
    k is <re.Match object; span=(20, 22), match='in'>

can you share the text files you are using

Answered By: karam yakoub agha

When you read the file lines with for lines in my_file: the line break chars remain at the end of the lines. You need to use pattern.rstrip() to get rid of the trailing whitespace, or – if the patterns can end in menaingful whitespace, it is safer to use .rstrip('n'). If you have no meaningful whitespace on both ends of each pattern, you can use pattern.strip().

There seems to be no reason to use str.format, just use the pattern variable directly.

So you need to use

k=re.search(pattern.rstrip('n'), lines) 
# or if there can be no meaningful whitespace at the end of each pattern:
k=re.search(pattern.rstrip(), lines) 
# or if there can be no meaningful whitespace on both ends of each pattern:
k=re.search(pattern.strip(), lines) 
Answered By: Wiktor Stribiżew
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.