How to import a subset of a zip file into colab?

Question:

I have a very big zip file in my google drive which contain several subfloders. Now, I’d like to extract only a few subfolders (not all folder into colab). Is there any way for this?

For instance, suppose the zip file name is "MyBigFile.zip" which contain "folder1", "folder2", "folder3", "folder4", and "folder5". I only want to import and extract "folder1",and "folder4" into my google colab (and better import only 200 images from it only). How is it possible? any suggestion?

*if this is related: each folder 1-5 contains around 50000 .png files

Asked By: Rainbow

||

Answers:

After some searching I found something. You can use the zipfile module in google collab too.

from zipfile import ZipFile
from google.colab import drive

drive.mount('/content/drive')

zipfile = ZipFile("Zip File Path") # MyBigFile.zip
def extract(folderName, numberOfFiles):
    files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
    for file in files:
        zipfile.extract(file, 'Output Folder Path')  # extractedFolder
 
extract("folder1/", 200)
zipfile.close()

You can remove google.colab code why mounting drive manually clicks on this button.

enter image description here enter image description here

After That, you can remove these two lines of code.

from zipfile import ZipFile
# from google.colab import drive

# drive.mount('/content/drive')

zipfile = ZipFile("MyBigFile.zip")
def extract(folderName, numberOfFiles):
    files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
    for file in files:
        zipfile.extract(file, 'extractedFolder')

extract("folder1/", 200)
zipfile.close()
Answered By: codester_09

You need to mount your Google Drive through Colab first:

from google.colab import drive
drive.mount('/content/drive')

Now unzip only specific folders where you want them:

!unzip /path_to/MyBigFile.zip 'folder1/*' -d /path_to_unzip
Answered By: Incredi Blame