Python unzip multiple .gz files

Question:

I have compressed a file into several chunks using 7zip:

HAVE:

foo.txt.gz.001
foo.txt.gz.002
foo.txt.gz.003
foo.txt.gz.004
foo.txt.gz.005

WANT:

foo.txt

How do I unzip and combine these chunks to get a single file using python?

Asked By: r0f1

||

Answers:

First you must extract all the zip files sequentially:

import zipfile

paths = ["path_to_1", "path_to_2" ]
extract_paths = ["path_to_extract1", "path_to_extrac2"]

for i in range(0, paths):
    zip_ref = zipfile.ZipFile(paths[i], 'r')
    zip_ref.extractall(extract_paths[i])
    zip_ref.close()

Next you can go to the extracted location and read() individual files with open into a string. Concatenate those strings and save to foo.txt.

Answered By: newkid

First, get the list of all files.

files = ['/path/to/foo.txt.gz.001', '/path/to/foo.txt.gz.002', '/path/to/foo.txt.gz.003']

Then iterate over each file and append to a result file.

with open('./result.gz', 'ab') as result:  # append in binary mode
    for f in files:
        with open(f, 'rb') as tmpf:        # open in binary mode also
            result.write(tmpf.read())

Then extract is using zipfile lib. You could use tempfile to avoid handle with temporary zip file.

Answered By: Mauro Baraldi
import os, gzip, shutil
dir_name = '/Users/username/Desktop/data'
def gz_extract(directory):
    extension = ".gz"
    os.chdir(directory)
    for item in os.listdir(directory): # loop through items in dir
      if item.endswith(extension): # check for ".gz" extension
          gz_name = os.path.abspath(item) # get full path of files
          file_name = (os.path.basename(gz_name)).rsplit('.',1)[0] #get file name for file within
          with gzip.open(gz_name,"rb") as f_in, open(file_name,"wb") as f_out:
              shutil.copyfileobj(f_in, f_out)
          os.remove(gz_name) # delete zipped file      
gz_extract(dir_name)
Answered By: joy google
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.