remove list elements based on pattern match

Question:

I have some code that reads in many Excel files from a folder.

Sometimes, there is a file lock on one of the files that makes the locked file show up when doing the glob search. For example: "C:~$filename.xlsx"

The file doesn’t show up in the folder (even with ‘show hidden’ checked) and I’ve tried to end Excel going through the task manager, which isn’t running. The only way to get the ghost file to not show up is to reboot the machine.

So I thought I would just eliminate that item from the list if a similar locked file shows up again.

The following code is not producing four elements. It produces five.

My pattern is "~$" for this example.

Can someone point out the error in the regex pattern?

import re

folder = ['C:Work~$Counts.xlsx', '~$ad_;', 'dslkjf$dl;jf', '$lkajd~f', 'C:WorkCounts.xlsx']

pattern = re.compile(r'\~$')

# get rid of any list items that contain "~$"
filelist = [i for i in folder if not pattern.match(i)]

print(filelist)

Thanks for any help.

Asked By: Jeremy_Miller

||

Answers:

Right now you’re finding strings that are exactly ~$, not ...~$.... What you need is:

pattern = re.compile(r'.+?\~$.+')

.+ means match as many characters as few times as possible until ~$ is found.

Answered By: Francis Godinho
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.