can I read and write files in tar.gz without decompression?
Question:
PROBLEM OUTLINE: can we read and write files in tar.gz without decompression?
I have many tar.gz
files named like GF1_PMS1_E72.0_N33.6_20160507_L1A0001568810.tar.gz
each of the tar.gz
file contains the files like below:
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.tiff
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.xml
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.rpb
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.jpg
I want to read the tiff
to numpy array without decompression so I need to get the full path of the tiff
but I failed by using tarfile package.
below is my codes tried:
inpath = 'H:\alongKKH IMAGES1\'
def ReadTars(inpath):
tar_files = os.listdir(inpath)
for tar in tar_files:
if tar.split('_')[1] == 'PMS1':
print tar
tarname = tar
tar = tarfile.open(os.path.join(inpath, tar), "r:gz")
for file_name in tar.getnames():
if file_name[-4:]=='tiff':
print file_name
rasterpath = os.path.join(inpath, tarname + '\' + file_name)
array = raster2array(rasterpath)
break
else:
tar = tarfile.open(os.path.join(inpath, tar), "r:gz")
for file_name in tar.getnames():
if file_name[-4:]=='tiff':
#array = raster2array(os.path.join(inpath, tar, file_name))
break
raster2array
is a function to read image to numpy array.
def raster2array(rasterfn):
raster = gdal.Open(rasterfn)
array = raster.ReadAsArray()
return array
then its throw error below:
ERROR 4: `H:alongKKH IMAGES1GF1_PMS1_E72.0_N33.6_20160507_L1A0001568810.tar.gzGF1_PMS1_E72.0_N33.6_20160507_L1A0001568810-MSS1.tiff' does not exist in the file system,
and is not recognized as a supported dataset name.
Who can help me with this I will be gratefull, thank you. I use python for windows.
Answers:
(inpath, tarname + ” + file_name) — just a path, don’t real file, raster2array support tar? If can’t, so “does not exist in the file system”.
tarfile has not read(), zipfile has it,so:
import zipfile
file = zipfile.ZipFile(inpath+'GF1_PMS1_E72.zip', "r")
for name in file.namelist():
data = file.read(name)
print name, len(data), repr(data[:10])
If you search and get tarfile’s read(), like above.
"rasterfn" is not physical file, then happened error.
GDALOpen, drivers supporting the VSI virtual file API, it is possible to open a file in a .tar/.tar.gz/.tgz archive (see VSIInstallTarFileHandler()):
VSIInstallTarFileHandler()
PROBLEM OUTLINE: can we read and write files in tar.gz without decompression?
I have many tar.gz
files named like GF1_PMS1_E72.0_N33.6_20160507_L1A0001568810.tar.gz
each of the tar.gz
file contains the files like below:
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.tiff
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.xml
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.rpb
GF1_PMS1_E72.6_N33.6_20160511_L1A0001576267-MSS1.jpg
I want to read the tiff
to numpy array without decompression so I need to get the full path of the tiff
but I failed by using tarfile package.
below is my codes tried:
inpath = 'H:\alongKKH IMAGES1\'
def ReadTars(inpath):
tar_files = os.listdir(inpath)
for tar in tar_files:
if tar.split('_')[1] == 'PMS1':
print tar
tarname = tar
tar = tarfile.open(os.path.join(inpath, tar), "r:gz")
for file_name in tar.getnames():
if file_name[-4:]=='tiff':
print file_name
rasterpath = os.path.join(inpath, tarname + '\' + file_name)
array = raster2array(rasterpath)
break
else:
tar = tarfile.open(os.path.join(inpath, tar), "r:gz")
for file_name in tar.getnames():
if file_name[-4:]=='tiff':
#array = raster2array(os.path.join(inpath, tar, file_name))
break
raster2array
is a function to read image to numpy array.
def raster2array(rasterfn):
raster = gdal.Open(rasterfn)
array = raster.ReadAsArray()
return array
then its throw error below:
ERROR 4: `H:alongKKH IMAGES1GF1_PMS1_E72.0_N33.6_20160507_L1A0001568810.tar.gzGF1_PMS1_E72.0_N33.6_20160507_L1A0001568810-MSS1.tiff' does not exist in the file system,
and is not recognized as a supported dataset name.
Who can help me with this I will be gratefull, thank you. I use python for windows.
(inpath, tarname + ” + file_name) — just a path, don’t real file, raster2array support tar? If can’t, so “does not exist in the file system”.
tarfile has not read(), zipfile has it,so:
import zipfile
file = zipfile.ZipFile(inpath+'GF1_PMS1_E72.zip', "r")
for name in file.namelist():
data = file.read(name)
print name, len(data), repr(data[:10])
If you search and get tarfile’s read(), like above.
"rasterfn" is not physical file, then happened error.
GDALOpen, drivers supporting the VSI virtual file API, it is possible to open a file in a .tar/.tar.gz/.tgz archive (see VSIInstallTarFileHandler()):
VSIInstallTarFileHandler()