Pattern matching of filename between underscores and compare included date string to current datetime
Question:
Simmilarly to the Topic to Open only files containing dates of past seven days in filename i want to open only those files wich follow a very rigid rule with respect to their naming and extract part of the filename to do a date comparission.
My filename is build like this
{Custom-prefix}_{SupplierName}_{8 digtit_date}.csv
an example:
myprefix_Shop_no24_20221009.csv
so the supplier name can have underscores in them. But each part of the string is devided by underscores as well.
I do have the complete list of for {SupplierName}
but this can change over time and i would like to avoid a soultion that hard codes them. The {SupplierName}
can have numbers in them and they are of various lenght and include "_".
I tired this:
prefix = "Custom-prefix"
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)([0-9]{8})(?=.csv)")
# I get the filenames via os.walk
matched = pattern.search(filname)
but this seems to mach everything that sits between "CustomPrefix" and ".csv".
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)(?=.csv)")
Is giving me the exact same result.The way i understand this, i have to make regex aware, that it has to match the individual parts of the string and respect the underscore. so that each group of my filename:
myprefix
_
Shop_no24
_
20221009
.csv
gets recognized. I found a solution to match to underscores in names here but i am unfortunatley not able to get the regex myself and macht the found groups afterwards to do the date comparisson.
Thank you in advance
Answers:
You can use
pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")
Note the double escaped literal braces in the f-string literal.
See the Python demo:
import re
filename = "Custom-prefix_Shop_no24_20221009.csv"
prefix = "Custom-prefix"
pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")
matched = pattern.search(filename)
if matched:
supplier, year, month, day = matched.groups()
print(f'supplier={supplier}, year={year}, month={month}, day={day}')
Output:
supplier=Shop_no24, year=2022, month=10, day=09
With (d{4})(d{2})(d{2})
part, you capture all date parts into separate groups so that you can manipulate them however you see fit.
Simmilarly to the Topic to Open only files containing dates of past seven days in filename i want to open only those files wich follow a very rigid rule with respect to their naming and extract part of the filename to do a date comparission.
My filename is build like this
{Custom-prefix}_{SupplierName}_{8 digtit_date}.csv
an example:
myprefix_Shop_no24_20221009.csv
so the supplier name can have underscores in them. But each part of the string is devided by underscores as well.
I do have the complete list of for {SupplierName}
but this can change over time and i would like to avoid a soultion that hard codes them. The {SupplierName}
can have numbers in them and they are of various lenght and include "_".
I tired this:
prefix = "Custom-prefix"
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)([0-9]{8})(?=.csv)")
# I get the filenames via os.walk
matched = pattern.search(filname)
but this seems to mach everything that sits between "CustomPrefix" and ".csv".
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)(?=.csv)")
Is giving me the exact same result.The way i understand this, i have to make regex aware, that it has to match the individual parts of the string and respect the underscore. so that each group of my filename:
myprefix
_
Shop_no24
_
20221009
.csv
gets recognized. I found a solution to match to underscores in names here but i am unfortunatley not able to get the regex myself and macht the found groups afterwards to do the date comparisson.
Thank you in advance
You can use
pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")
Note the double escaped literal braces in the f-string literal.
See the Python demo:
import re
filename = "Custom-prefix_Shop_no24_20221009.csv"
prefix = "Custom-prefix"
pattern = re.compile(fr"{prefix}_(w*)_(d{{4}})(d{{2}})(d{{2}}).csv")
matched = pattern.search(filename)
if matched:
supplier, year, month, day = matched.groups()
print(f'supplier={supplier}, year={year}, month={month}, day={day}')
Output:
supplier=Shop_no24, year=2022, month=10, day=09
With (d{4})(d{2})(d{2})
part, you capture all date parts into separate groups so that you can manipulate them however you see fit.