How to implement re.IGNORECASE method in grep of python-docx

Question

I would like to make grep function with python-docx.

from docx import Document

files = glob.glob("Folders/*.docx")
fetchWord = "sample"

for file in files:
    document = Document(file)
    count = 0
    for para in document.paragraphs:
        
        if para.text.find(fetchWord) > -1:
            print(file + ":" + "Line" + str(count) + ":" + para.text)

With this code, I can grep only "sample", but not grep "Sample", "sAmPle" and so on.

For grep these words, I would like to implement re.IGNORECASE method in the above code.
How do I do this?

Asked By: aaaaa0a

||

Source

Answer 1

You can use Python’s re module if you want to use re.IGNORECASE. If you need to use a regex, then this is the way to go. You can do so like this:

import re
from docx import Document

files = glob.glob("Folders/*.docx")
fetchWord = "sample"

for file in files:
    document = Document(file)
    count = 0
    for para in document.paragraphs:
        if re.match(fetchWord, para.text, re.IGNORECASE) != None:
            print(file + ":" + "Line" + str(count) + ":" + para.text)

However, if you simply want to search for text and do not need to use a regex, you can use the .lower() method to convert the paragraph to lowercase. Like this:

from docx import Document

files = glob.glob("Folders/*.docx")
fetchWord = "sample"

for file in files:
    document = Document(file)
    count = 0
    for para in document.paragraphs:
        
        if para.text.lower().find(fetchWord) > -1:
            print(file + ":" + "Line" + str(count) + ":" + para.text)

Answered By: Michael M.

How to implement re.IGNORECASE method in grep of python-docx

Question:

Answers: