Pattern matching of filename between underscores and compare included date string to current datetime

Question:

Simmilarly to the Topic to Open only files containing dates of past seven days in filename i want to open only those files wich follow a very rigid rule with respect to their naming and extract part of the filename to do a date comparission.

My filename is build like this

{Custom-prefix}_{SupplierName}_{8 digtit_date}.csv
an example:

myprefix_Shop_no24_20221009.csv

so the supplier name can have underscores in them. But each part of the string is devided by underscores as well.

I do have the complete list of for {SupplierName} but this can change over time and i would like to avoid a soultion that hard codes them. The {SupplierName} can have numbers in them and they are of various lenght and include "_".

I tired this:

prefix = "Custom-prefix"
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)([0-9]{8})(?=.csv)")
# I get the filenames via os.walk
matched = pattern.search(filname)

but this seems to mach everything that sits between "CustomPrefix" and ".csv".

pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)(?=.csv)")

Is giving me the exact same result.The way i understand this, i have to make regex aware, that it has to match the individual parts of the string and respect the underscore. so that each group of my filename:

 myprefix
_
Shop_no24
_
20221009
.csv

gets recognized. I found a solution to match to underscores in names here but i am unfortunatley not able to get the regex myself and macht the found groups afterwards to do the date comparisson.

Thank you in advance

Asked By: NorrinRadd

||

Answers:

You can use

pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")

Note the double escaped literal braces in the f-string literal.

See the Python demo:

import re
filename = "Custom-prefix_Shop_no24_20221009.csv"
prefix = "Custom-prefix"
pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")
matched = pattern.search(filename)
if matched:
    supplier, year, month, day  = matched.groups()
    print(f'supplier={supplier}, year={year}, month={month}, day={day}')

Output:

supplier=Shop_no24, year=2022, month=10, day=09

With (d{4})(d{2})(d{2}) part, you capture all date parts into separate groups so that you can manipulate them however you see fit.

Answered By: Wiktor Stribiżew