Getting Error: [Errno 95] Operation not supported while writing zip file in databricks

Question:

Here i am trying to zip the file and write that to one folder (mount point) using below code in Databricks.

# List all files which need to be compressed
import os
modelPath  = '/dbfs/mnt/temp/zip/'
filenames = [os.path.join(root, name) for root, dirs, files in os.walk(top=modelPath , topdown=False) for name in files]
print(filenames)

zipPath = '/dbfs/mnt/temp/compressed/demo.zip'
import zipfile
with zipfile.ZipFile(zipPath, 'w') as myzip:
  for filename in filenames:
    print(filename)
    print(myzip)
    myzip.write(filename)

But I am getting error as [Errno 95] Operation not supported.

Error Details

OSError                                   Traceback (most recent call last)
<command-2086761864237851> in <module>
     15     print(myzip)
---> 16     myzip.write(filename)

/usr/lib/python3.8/zipfile.py in write(self, filename, arcname, compress_type, compresslevel)
   1775             with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
-> 1776                 shutil.copyfileobj(src, dest, 1024*8)
   1777 

/usr/lib/python3.8/zipfile.py in close(self)
   1181                 self._fileobj.write(self._zinfo.FileHeader(self._zip64))
-> 1182                 self._fileobj.seek(self._zipfile.start_dir)
   1183 

OSError: [Errno 95] Operation not supported

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
/usr/lib/python3.8/zipfile.py in close(self)
   1837                     if self._seekable:
-> 1838                         self.fp.seek(self.start_dir)
   1839                     self._write_end_record()

OSError: [Errno 95] Operation not supported

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
OSError: [Errno 95] Operation not supported

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<command-2086761864237851> in <module>
     14     print(filename)
     15     print(myzip)
---> 16     myzip.write(filename)

/usr/lib/python3.8/zipfile.py in __exit__(self, type, value, traceback)
   1310 
   1311     def __exit__(self, type, value, traceback):
-> 1312         self.close()
   1313 
   1314     def __repr__(self):

/usr/lib/python3.8/zipfile.py in close(self)
   1841             fp = self.fp
   1842             self.fp = None
-> 1843             self._fpclose(fp)
   1844 
   1845     def _write_end_record(self):

/usr/lib/python3.8/zipfile.py in _fpclose(self, fp)
   1951         self._fileRefCnt -= 1
   1952         if not self._fileRefCnt and not self._filePassed:
-> 1953             fp.close()
   1954 
   1955 

Could anyone help me to resolve this issue.

Note: Here i can zip the file using shutil, but i want avoid driver so using above approch.

Asked By: Sharma

||

Answers:

You didn’t provide details of your mount, probably it’s Blob Storage or ADLSv2 and apparently it doesn’t allow file seek.

Check out this simple snippet:

%python

path = '/dbfs/mnt/temp/testfile'

with open(path, "w") as f:
    f.write("test")
    f.seek(1)
    f.write("x")

with open(path, "r") as f:
    print(f.read()) 

It will throw "Operation not supported" at f.seek(1).

Repeat the same with path = '/tmp/testfile' and you’ll get correct result ("txst").

Weird thing is that the seek in zipfile.py should not be reached at all, it looks like self._seekable returned incorrect value, I’m not sure if that’s a problem of the library or Azure.

Anyway, just create the archive in local directory and move it to the mount afterwards.

tempPath = '/tmp/demo.zip'
zipPath = '/dbfs/mnt/temp/compressed/demo.zip'
import zipfile
import os

with zipfile.ZipFile(tempPath, 'w') as myzip:
  for filename in filenames:
    print(filename)
    print(myzip)
    myzip.write(filename)

os.rename(tempPath, zipPath)
Answered By: Kombajn zbożowy