Create folders with file name and rename part of it

Question:

i have some pdfs files that i need to create folders with part of your name and move the pdfs to the folders.

cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
for file in glob.glob("*.pdf"):
    dst = cwd + "\" + file.replace(str(padrao), '').replace('P', '')
    os.mkdir(dst)
    shutil.move(file, dst)

ex: I have the file P9883231_V1_A0V0_T07-54-369-664_S00001.pdf, P9883231_V1_A0V0_T07-54-369-664_S00002.pdf and
P1235567_V1_A0V0_T07-54-369-664_S00001.pdf.

In this example I need the script to create two folders: 9883231 and 1234567. (the part in italics must be the name of the folder)

notice that in my code I remove the unwanted parts to create the folder, the ‘P’ at the beginning and part of padrao = ‘_V1_A0V0_T07-54-369-664_S00001.pdf’

The problem is that at the end of the padrao the number can be variable, the file can end with "02.pdf" , "03.pdf"

In the example I mentioned above, the folder 9883231 should contain both files.

Asked By: Gabriel Passos

||

Answers:

Regular expressions can do the trick here:

import re
import os
import glob
import shutil


cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S000'
for file in glob.glob("*.pdf"):
    dst = os.path.join(cwd, re.findall("P(.*)" + padrao + "d{2}.pdf", file)[0])

    os.mkdir(dst)
    shutil.move(file, dst)

Notice that I remove the part of padrao that varies.
The regex matches all strings that begin ith a P, followed by the padrao string value, followed by 2 digits, followed by .pdf; and takes the first occurence (no check is made wether it found anything here …)

Also, it is better practice to use os.path.join() to avoid issues when creating path strings (when whanging os notably)

Answered By: PlainRavioli
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.