Checking if filename prefixes match parent directory prefix recursively with pathlib

Question:

I’ve written a script that uses pathlib to compare a list of files provided by the user to what is actually in a target directory. It then returns lists of files that were expected but not found, and files that were found but were not expected. It works just fine.

My issue now is that I want to verify that filename prefixes match the prefix of their parent directory, and return an error when they don’t. So a folder named abc2022_001 should contain files that start with abc2022_ and not abc2023_. This is what I have so far:

from pathlib import Path

fileList = open("fileList.txt", "r")
data = fileList.read()
fileList_reformatted = data.replace('n', '').split(",")
print(fileList_reformatted)

p = Path('C:/Users/Common/Downloads/compare').rglob('*')
filePaths = [x for x in p if x.is_file()]
filePaths_string = [str(x) for x in filePaths]
print(filePaths_string)

differences1 = []
for element in fileList_reformatted:
    if element not in filePaths_string:
        differences1.append(element)

print("The following files from the provided list were not found:",differences1)

differences2 = []
for element in filePaths_string:
    if element not in fileList_reformatted:
        differences2.append(element)

print("The following unexpected files were found:",differences2)

wrong_location = []
for element in p:
    if element.Path.name.split("_")[0:1] != element.Path.parent.split("_")[0:1]:
        wrong_location.append(element)
    
print("Following files may be in the wrong location:",wrong_location)

The script runs, but returns no errors on a test directory. Where am I going wrong here? Thanks!

Asked By: Paul

||

Answers:

You could try just picking the first element from the splits in this line.

if element.Path.name.split("_")[0:1] != element.Path.parent.split("_")[0:1]:

like so

 if element.Path.name.split("_")[0] != element.Path.parent.split("_")[0]:

The first version compares two lists ['abc22'] == ['abc23'] and not the actual values 'abc22' == 'abc23'. That might be the cause.

Answered By: Raphael

The answer turned out to be:

for element in filePaths:
if element.parts[-1].split("_")[0] != element.parent.parts[-1].split("_")[0]:

Thanks for helping, folks.

Answered By: Paul
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.