Pattern matching of filename between underscores and compare included date string to current datetime

Question

Simmilarly to the Topic to Open only files containing dates of past seven days in filename i want to open only those files wich follow a very rigid rule with respect to their naming and extract part of the filename to do a date comparission.

My filename is build like this

{Custom-prefix}_{SupplierName}_{8 digtit_date}.csv
an example:

myprefix_Shop_no24_20221009.csv

so the supplier name can have underscores in them. But each part of the string is devided by underscores as well.

I do have the complete list of for {SupplierName} but this can change over time and i would like to avoid a soultion that hard codes them. The {SupplierName} can have numbers in them and they are of various lenght and include "_".

I tired this:

prefix = "Custom-prefix"
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)([0-9]{8})(?=.csv)")
# I get the filenames via os.walk
matched = pattern.search(filname)

but this seems to mach everything that sits between "CustomPrefix" and ".csv".

pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)(?=.csv)")

Is giving me the exact same result.The way i understand this, i have to make regex aware, that it has to match the individual parts of the string and respect the underscore. so that each group of my filename:

 myprefix
_
Shop_no24
_
20221009
.csv

gets recognized. I found a solution to match to underscores in names here but i am unfortunatley not able to get the regex myself and macht the found groups afterwards to do the date comparisson.

Thank you in advance

Asked By: NorrinRadd

||

Source

Answer 1

You can use

pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")

Note the double escaped literal braces in the f-string literal.

See the Python demo:

import re
filename = "Custom-prefix_Shop_no24_20221009.csv"
prefix = "Custom-prefix"
pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")
matched = pattern.search(filename)
if matched:
    supplier, year, month, day  = matched.groups()
    print(f'supplier={supplier}, year={year}, month={month}, day={day}')

Output:

supplier=Shop_no24, year=2022, month=10, day=09

With (d{4})(d{2})(d{2}) part, you capture all date parts into separate groups so that you can manipulate them however you see fit.

Answered By: Wiktor Stribiżew

Pattern matching of filename between underscores and compare included date string to current datetime

Question:

Answers: