Python – Combine text data from group of file by filename

Question:

I have below list of text files , I wanted to combine group of files like below

Inv030001.txt – should have all data of files starting with Inv030001

Inv030002.txt – should have all data of files starting with Inv030002

enter image description here

I tried below code but it’s not working

filenames = glob(textfile_dir+'*.txt')
for fname in filenames:
filename = fname.split('\')[-1]
current_invoice_number =  (filename.split('_')[0]).split('.')[0]
prev_invoice_number = current_invoice_number
with open(textfile_dir + current_invoice_number+'.txt', 'w') as outfile:
    for eachfile in fnmatch.filter(os.listdir(textfile_dir), '*[!'+current_invoice_number+'].txt'):
        current_invoice_number = (eachfile.split('_')[0]).split('.')[0]
        if(current_invoice_number == prev_invoice_number):
            with open(textfile_dir+eachfile) as infile:
                for line in infile:
                    outfile.write(line)
            prev_invoice_number = current_invoice_number
        else:
            with open(textfile_dir+eachfile) as infile:
                for line in infile:
                    outfile.write(line)
            prev_invoice_number = current_invoice_number
            #break;
Asked By: ZKS

||

Answers:

Your code may have had a little too much complications in it. And so, the idea is that for every file in the directory, just add it’s contents (that is, append) to the invoice file.

from glob import glob, fnmatch
import os
textfile_dir="invs" + os.sep  # # I changed this to os.sep since I'm on a MAC - hopefully works in windows, too
filenames = glob(textfile_dir+'*.txt')
for fname in filenames:
  filename = fname.split(os.sep)[-1]  
  current_invoice_number =  (filename.split('_')[0]).split('.')[0]
  with open(textfile_dir + current_invoice_number+'.txt', 'a') as outfile:
      with open(fname) as infile:
          for line in infile:
              outfile.write(line)

Some room for improvement:

  1. If you created your accumulation files in a different directory, there would be less of a chance of you picking them up when you run this program again (we are using append 'a' when we open the files for writing.
  2. The order of the files is not preserved with glob (AFAIK). This may not be great for having deterministic results.
Answered By: Mark

Does this answer your question? My version will append the data from "like" invoice numbers to a .txt file named with just the invoice number. In other words, anything that starts with "Inv030001" will have it’s contents appended to "Inv030001.txt". The idea being that you likely don’t want to overwrite files and possibly destroy them if your write logic had a mistake.

I actually recreated your files to test this. I did exactly what I suggested you do. I just treated every part as a separate task and built it up to this, and in doing that the script became far less verbose and convoluted. I labeled all of my comments with task to pound it in that this is just a series of very simple things.

I also renamed your vars to what they actually are. For instance, filenames aren’t filenames, at all. They are entire paths.

import os
from glob import glob

#you'll have to change this path to yours
root  = os.path.join(os.getcwd(), 'texts/')

#sorting this may be redundant 
paths = sorted(glob(root+'*.txt'))

for path in paths:
    #task: get filename
    filename = path.split('\')[-1]
    #task: get invoice number
    invnum   = filename.split('_')[0]
    #task: open in and out files
    with open(f'{root}{invnum}.txt', 'a') as out_, open(path, 'r') as in_:
        #task: append in contents to out
        out_.write(in_.read())
Answered By: OneMadGypsy

Below is the working code, if someone is looking for same solution

filenames = glob(textfile_dir+'*.txt')
dd = defaultdict(list)
for filename in filenames:
name, ext = os.path.splitext(filename)
name = name.split('\')[-1].split('_')[0]
dd[name].append(filename)

for key, fnames in dd.items():
with open(textfile_dir+key+'.txt', "w") as newfile:
     for line in fileinput.FileInput(fnames):
         newfile.write(line)
Answered By: ZKS
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.