Load .zip file from GitHub in Google Colab
Question:
I have a zip file at my GitHub repo. I want to load it into my Google Colab files. I have it’s url from where it can be dowloaded like https://raw.githubusercontent.com/rehmatsg/../master/...zip
I used this method to download file into Google Colab
from google.colab import files
url = 'https://raw.githubusercontent.com/user/.../master/...zip'
files.download(url)
But I get this error
FileNotFoundError Traceback (most recent call last)
<ipython-input-5-c974a89c0412> in <module>()
3 from google.colab import files
4
----> 5 files.download(url)
/usr/local/lib/python3.7/dist-packages/google/colab/files.py in download(filename)
141 raise OSError(msg)
142 else:
--> 143 raise FileNotFoundError(msg) # pylint: disable=undefined-variable
144
145 comm_manager = _IPython.get_ipython().kernel.comm_manager
FileNotFoundError: Cannot find file: https://raw.githubusercontent.com/user/.../master/...zip
Files in Google Colab are temporary, so I cannot upload it each time. This is the reason I wanted to host the file in my project’s GitHub repo.
What would be the correct method to download the file into Google Colab?
Answers:
You could do this to clone the entire repository.
!git clone https://[email protected]/username/reponame.git
This creates a folder called reponame
and is convenient if you have many files to download. The personalaccesstoken
allows private repositories to be accessed.
I do not use Google Colab, but I looked at this description. Understand that the google.colab.download
option is to download Google Colab files. It’s not for downloading any file. If this file is public you can use other libraries to retrieve the file. For example, you can use urllib
:
from urllib.request import urlretrieve
urlretrieve(url)
If you decide you need more files and use the code, then consider the other answer about git clone
Use the "wget" bash command. Just open the Github project and go to the download as zip option (at top right). Then, copy the url and use "wget" command.
!wget https://github.com/nytimes/covid-19-data/archive/refs/heads/master.zip
!unzip /content/master.zip
good luck.
Let’s suppose that i.e. GitHub repo https://github.com/lukyfox/Datafiles contains folder digits with two zip files digits.zip and digits_small.zip. To download and unzip certain zip file from GitHub repo (not the whole repo or folder, but only digits.zip) into Google Colab session storage:
- go to zip file you want to download (i.e. https://github.com/lukyfox/Datafiles/blob/master/digits/digits.zip)
- Locate button Download and copy its address (RMB->Copy link address), for the example above copied address is https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
- Go to Google colab file and use
!wget
command with the copied address to download and !unzip
to unzip the file into session storage:
!wget https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
!unzip /content/digits.zip
You can also rename the file after download or specify folder name for unzipped data.
You may notice that the dowloadable address differs from zip file address just a little. In fact should be enough to replace blob with raw to get the right address for any zip file.
I have a zip file at my GitHub repo. I want to load it into my Google Colab files. I have it’s url from where it can be dowloaded like https://raw.githubusercontent.com/rehmatsg/../master/...zip
I used this method to download file into Google Colab
from google.colab import files
url = 'https://raw.githubusercontent.com/user/.../master/...zip'
files.download(url)
But I get this error
FileNotFoundError Traceback (most recent call last)
<ipython-input-5-c974a89c0412> in <module>()
3 from google.colab import files
4
----> 5 files.download(url)
/usr/local/lib/python3.7/dist-packages/google/colab/files.py in download(filename)
141 raise OSError(msg)
142 else:
--> 143 raise FileNotFoundError(msg) # pylint: disable=undefined-variable
144
145 comm_manager = _IPython.get_ipython().kernel.comm_manager
FileNotFoundError: Cannot find file: https://raw.githubusercontent.com/user/.../master/...zip
Files in Google Colab are temporary, so I cannot upload it each time. This is the reason I wanted to host the file in my project’s GitHub repo.
What would be the correct method to download the file into Google Colab?
You could do this to clone the entire repository.
!git clone https://[email protected]/username/reponame.git
This creates a folder called reponame
and is convenient if you have many files to download. The personalaccesstoken
allows private repositories to be accessed.
I do not use Google Colab, but I looked at this description. Understand that the google.colab.download
option is to download Google Colab files. It’s not for downloading any file. If this file is public you can use other libraries to retrieve the file. For example, you can use urllib
:
from urllib.request import urlretrieve
urlretrieve(url)
If you decide you need more files and use the code, then consider the other answer about git clone
Use the "wget" bash command. Just open the Github project and go to the download as zip option (at top right). Then, copy the url and use "wget" command.
!wget https://github.com/nytimes/covid-19-data/archive/refs/heads/master.zip
!unzip /content/master.zip
good luck.
Let’s suppose that i.e. GitHub repo https://github.com/lukyfox/Datafiles contains folder digits with two zip files digits.zip and digits_small.zip. To download and unzip certain zip file from GitHub repo (not the whole repo or folder, but only digits.zip) into Google Colab session storage:
- go to zip file you want to download (i.e. https://github.com/lukyfox/Datafiles/blob/master/digits/digits.zip)
- Locate button Download and copy its address (RMB->Copy link address), for the example above copied address is https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
- Go to Google colab file and use
!wget
command with the copied address to download and!unzip
to unzip the file into session storage:
!wget https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
!unzip /content/digits.zip
You can also rename the file after download or specify folder name for unzipped data.
You may notice that the dowloadable address differs from zip file address just a little. In fact should be enough to replace blob with raw to get the right address for any zip file.