How to import a subset of a zip file into colab?
Question:
I have a very big zip file in my google drive which contain several subfloders. Now, I’d like to extract only a few subfolders (not all folder into colab). Is there any way for this?
For instance, suppose the zip file name is "MyBigFile.zip" which contain "folder1", "folder2", "folder3", "folder4", and "folder5". I only want to import and extract "folder1",and "folder4" into my google colab (and better import only 200 images from it only). How is it possible? any suggestion?
*if this is related: each folder 1-5 contains around 50000 .png files
Answers:
After some searching I found something. You can use the zipfile
module in google collab too.
from zipfile import ZipFile
from google.colab import drive
drive.mount('/content/drive')
zipfile = ZipFile("Zip File Path") # MyBigFile.zip
def extract(folderName, numberOfFiles):
files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
for file in files:
zipfile.extract(file, 'Output Folder Path') # extractedFolder
extract("folder1/", 200)
zipfile.close()
You can remove google.colab
code why mounting drive manually clicks on this button.
After That, you can remove these two lines of code.
from zipfile import ZipFile
# from google.colab import drive
# drive.mount('/content/drive')
zipfile = ZipFile("MyBigFile.zip")
def extract(folderName, numberOfFiles):
files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
for file in files:
zipfile.extract(file, 'extractedFolder')
extract("folder1/", 200)
zipfile.close()
You need to mount your Google Drive through Colab first:
from google.colab import drive
drive.mount('/content/drive')
Now unzip only specific folders where you want them:
!unzip /path_to/MyBigFile.zip 'folder1/*' -d /path_to_unzip
I have a very big zip file in my google drive which contain several subfloders. Now, I’d like to extract only a few subfolders (not all folder into colab). Is there any way for this?
For instance, suppose the zip file name is "MyBigFile.zip" which contain "folder1", "folder2", "folder3", "folder4", and "folder5". I only want to import and extract "folder1",and "folder4" into my google colab (and better import only 200 images from it only). How is it possible? any suggestion?
*if this is related: each folder 1-5 contains around 50000 .png files
After some searching I found something. You can use the zipfile
module in google collab too.
from zipfile import ZipFile
from google.colab import drive
drive.mount('/content/drive')
zipfile = ZipFile("Zip File Path") # MyBigFile.zip
def extract(folderName, numberOfFiles):
files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
for file in files:
zipfile.extract(file, 'Output Folder Path') # extractedFolder
extract("folder1/", 200)
zipfile.close()
You can remove google.colab
code why mounting drive manually clicks on this button.
After That, you can remove these two lines of code.
from zipfile import ZipFile
# from google.colab import drive
# drive.mount('/content/drive')
zipfile = ZipFile("MyBigFile.zip")
def extract(folderName, numberOfFiles):
files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
for file in files:
zipfile.extract(file, 'extractedFolder')
extract("folder1/", 200)
zipfile.close()
You need to mount your Google Drive through Colab first:
from google.colab import drive
drive.mount('/content/drive')
Now unzip only specific folders where you want them:
!unzip /path_to/MyBigFile.zip 'folder1/*' -d /path_to_unzip