Using csvreader against a gzipped file in Python

Question:

I have a bunch of gzipped CSV files that I’d like to open for inspection using Python’s built in CSV reader. I’d like to do this without having first to manually unzip them to disk. I guess I want to somehow get a stream to the uncompressed data, and pass this into the CSV reader. Is this possible in Python?

Asked By: Mike Chamberlain

||

Answers:

Use the gzip module:

with gzip.open(filename, mode='rt') as f:
    reader = csv.reader(f)
    #...
Answered By: tzaman

a more complete solution:

import csv, gzip
class GZipCSVReader:
    def __init__(self, filename):
        self.gzfile = gzip.open(filename)
        self.reader = csv.DictReader(self.gzfile)
    def next(self):
        return self.reader.next()
    def close(self):
        self.gzfile.close()
    def __iter__(self):
        return self.reader.__iter__()

now you can use it like this:

r = GZipCSVReader('my.csv')
for map in r:
    for k,v in map:
        print k,v
r.close()

EDIT: following the below comment, how about a simpler approach:

def gzipped_csv(filename):
    with gzip.open(filename) as f:
        r = csv.DictReader(f)
        for row in r:
            yield row

which let’s you then

for row in gzipped_csv(filename):
    for k, v in row:
        print(k, v)
Answered By: yoavram

I’ve tried the above version for writing and reading and it didn’t work in Python 3.3 due to “bytes” error. However, after some trial and error I could get the following to work. Maybe it also helps others:

import csv
import gzip
import io


with gzip.open("test.gz", "w") as file:
    writer = csv.writer(io.TextIOWrapper(file, newline="", write_through=True))
    writer.writerow([1, 2, 3])
    writer.writerow([4, 5, 6])

with gzip.open("test.gz", "r") as file:
    reader = csv.reader(io.TextIOWrapper(file, newline=""))
    print(list(reader))

As amohr suggests, the following works as well:

import gzip, csv

with gzip.open("test.gz", "wt", newline="") as file:
    writer = csv.writer(file)
    writer.writerow([1, 2, 3])
    writer.writerow([4, 5, 6])

with gzip.open("test.gz", "rt", newline="") as file:
    reader = csv.reader(file)
    print(list(reader))
Answered By: Gerenuk
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.