Python Regex to extract file where filename contains and also should not contain specific pattern from a zip folder

Question:

I want to extract just one specific single file from the zip folder which has the below 3 files.
Basically it should start with ‘kpidata_nfile’ and should not contain ‘fileheader’

kpidata_nfile_20220919-20220925_fileheader.csv
kpidata_nfile_20220905-20220911.csv
othername_kpidata_nfile_20220905-20220911.csv

Below is my code i have tried-

from zipfile import ZipFile
import re
import os
for x in os.listdir('.'):
  if re.match('.*.(zip)', x):
      with ZipFile(x, 'r') as zip:
          for info in zip.infolist():
              if re.match(r'^kpidata_nfile_', info.filename):
                  zip.extract(info)

Output required – kpidata_nfile_20220905-20220911.csv

Asked By: Anusha

||

Answers:

This regex does what you require:

^kpidata_nfile(?:(?!fileheader).)*$

See this answer for more about the (?:(?!fileheader).)*$ part.

You can see the regex working on your example filenames here.

The regex is not particularly readable, so it might be better to use Python expressions instead of regex. Something like:

fname = info.filename
if fname.startswith('kpidata_nfile') and 'fileheader' not in fname:
Answered By: ljdyer