Delete multiple files matching a pattern

Question:

I have made an online gallery using Python and Django. I’ve just started to add editing functionality, starting with a rotation. I use sorl.thumbnail to auto-generate thumbnails on demand.

When I edit the original file, I need to clean up all the thumbnails so new ones are generated. There are three or four of them per image (I have different ones for different occasions).

I could hard-code in the file-varients… But that’s messy and if I change the way I do things, I’ll need to revisit the code.

Ideally I’d like to do a regex-delete. In regex terms, all my originals are named like so:

^(?P<photo_id>d+).jpg$

So I want to delete:

^(?P<photo_id>d+)[^d].*jpg$

(Where I replace photo_id with the ID I want to clean.)

Asked By: Oli

||

Answers:

Try something like this:

import os, re

def purge(dir, pattern):
    for f in os.listdir(dir):
        if re.search(pattern, f):
            os.remove(os.path.join(dir, f))

Then you would pass the directory containing the files and the pattern you wish to match.

Answered By: Andrew Hare

If you need recursion into several subdirectories, you can use this method:

import os, re, os.path
pattern = "^(?P<photo_id>d+)[^d].*jpg$"
mypath = "Photos"
for root, dirs, files in os.walk(mypath):
    for file in filter(lambda x: re.match(pattern, x), files):
        os.remove(os.path.join(root, file))

You can safely remove subdirectories on the fly from dirs, which contains the list of the subdirectories to visit at each node.

Note that if you are in a directory, you can also get files corresponding to a simple pattern expression with glob.glob(pattern). In this case you would have to substract the set of files to keep from the whole set, so the code above is more efficient.

Answered By: RedGlyph

It’s not clear to me that you actually want to do any named-group matching — in the use you describe, the photoid is an input to the deletion function, and named groups’ purpose is “output”, i.e., extracting certain substrings from the matched string (and accessing them by name in the match object). So, I would recommend a simpler approach:

import re
import os

def delete_thumbnails(photoid, photodirroot):
  matcher = re.compile(r'^%sd+D.*jpg$' % photoid)
  numdeleted = 0
  for rootdir, subdirs, filenames in os.walk(photodirroot):
    for name in filenames:
      if not matcher.match(name):
        continue
      path = os.path.join(rootdir, name)
      os.remove(path)
      numdeleted += 1
  return "Deleted %d thumbnails for %r" % (numdeleted, photoid)

You can pass the photoid as a normal string, or as a RE pattern piece if you need to remove several matchable IDs at once (e.g., r'abc[def] to remove abcd, abce, and abcf in a single call) — that’s the reason I’m inserting it literally in the RE pattern, rather than inserting the string re.escape(photoid) as would be normal practice. Certain parts such as counting the number of deletions and returning an informative message at the end are obviously frills which you should remove if they give you no added value in your use case.

Others, such as the “if not … // continue” pattern, are highly recommended practice in Python (flat is better than nested: bailing out to the next leg of the loop as soon as you determine there is nothing to do on this one is better than nesting the actions to be done within an if), although of course other arrangements of the code would work too.

Answered By: Alex Martelli

My recomendation:

def purge(dir, pattern, inclusive=True):
    regexObj = re.compile(pattern)
    for root, dirs, files in os.walk(dir, topdown=False):
        for name in files:
            path = os.path.join(root, name)
            if bool(regexObj.search(path)) == bool(inclusive):
                os.remove(path)
        for name in dirs:
            path = os.path.join(root, name)
            if len(os.listdir(path)) == 0:
                os.rmdir(path)

This will recursively remove every file that matches the pattern by default, and every file that doesn’t if inclusive is true. It will then remove any empty folders from the directory tree.

Answered By: DRayX

I find Popen(["rm " + file_name + "*.ext"], shell=True, stdout=PIPE).communicate() to be a much simpler solution to this problem. Although this is prone to injection attacks, I don’t see any issues if your program is using this internally.

Answered By: Kartos

How about this?

import glob, os, multiprocessing
p = multiprocessing.Pool(4)
p.map(os.remove, glob.glob("P*.jpg"))

Mind you this does not do recursion and uses wildcards (not regex).

UPDATE
In Python 3 the map() function will return an iterator, not a list. This is useful since you will probably want to do some kind processing on the items anyway, and an iterator will always be more memory-efficient to that end.

If however, a list is what you really need, just do this:

...
list(p.map(os.remove, glob.glob("P*.jpg")))

I agree it’s not the most functional way, but it’s concise and does the job.

Answered By: Valeriu PaloČ™

Using the glob module:

import glob, os
for f in glob.glob("P*.jpg"):
    os.remove(f)

Alternatively, using pathlib:

from pathlib import Path
for p in Path(".").glob("P*.jpg"):
    p.unlink()
Answered By: Sam Bull
def recursive_purge(dir, pattern):
    for f in os.listdir(dir):
        if os.path.isdir(os.path.join(dir, f)):
            recursive_purge(os.path.join(dir, f), pattern)
        elif re.search(pattern, os.path.join(dir, f)):
            os.remove(os.path.join(dir, f))
Answered By: Yanay Manhaim
import os, sys, glob, re

def main():

    mypath = "<Path to Root Folder to work within>"
    for root, dirs, files in os.walk(mypath):
        for file in files:
            p = os.path.join(root, file)
            if os.path.isfile(p):
                if p[-4:] == ".jpg": #Or any pattern you want
                os.remove(p)
Answered By: Charlie
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.