Load .zip file from GitHub in Google Colab

Question:

I have a zip file at my GitHub repo. I want to load it into my Google Colab files. I have it’s url from where it can be dowloaded like https://raw.githubusercontent.com/rehmatsg/../master/...zip

I used this method to download file into Google Colab

from google.colab import files

url = 'https://raw.githubusercontent.com/user/.../master/...zip'
files.download(url)

But I get this error

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-c974a89c0412> in <module>()
      3 from google.colab import files
      4 
----> 5 files.download(url)

/usr/local/lib/python3.7/dist-packages/google/colab/files.py in download(filename)
    141       raise OSError(msg)
    142     else:
--> 143       raise FileNotFoundError(msg)  # pylint: disable=undefined-variable
    144 
    145   comm_manager = _IPython.get_ipython().kernel.comm_manager

FileNotFoundError: Cannot find file: https://raw.githubusercontent.com/user/.../master/...zip

Files in Google Colab are temporary, so I cannot upload it each time. This is the reason I wanted to host the file in my project’s GitHub repo.
What would be the correct method to download the file into Google Colab?

Asked By: Rehmat Singh Gill

||

Answers:

You could do this to clone the entire repository.

!git clone https://[email protected]/username/reponame.git

This creates a folder called reponame and is convenient if you have many files to download. The personalaccesstoken allows private repositories to be accessed.

Answered By: Andrew Chisholm

I do not use Google Colab, but I looked at this description. Understand that the google.colab.download option is to download Google Colab files. It’s not for downloading any file. If this file is public you can use other libraries to retrieve the file. For example, you can use urllib:

from urllib.request import urlretrieve
urlretrieve(url)

If you decide you need more files and use the code, then consider the other answer about git clone

Answered By: astrochun

Use the "wget" bash command. Just open the Github project and go to the download as zip option (at top right). Then, copy the url and use "wget" command.

!wget https://github.com/nytimes/covid-19-data/archive/refs/heads/master.zip

!unzip /content/master.zip

good luck.

Answered By: Mohsen Fazaeli

Let’s suppose that i.e. GitHub repo https://github.com/lukyfox/Datafiles contains folder digits with two zip files digits.zip and digits_small.zip. To download and unzip certain zip file from GitHub repo (not the whole repo or folder, but only digits.zip) into Google Colab session storage:

  1. go to zip file you want to download (i.e. https://github.com/lukyfox/Datafiles/blob/master/digits/digits.zip)
  2. Locate button Download and copy its address (RMB->Copy link address), for the example above copied address is https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
  3. Go to Google colab file and use !wget command with the copied address to download and !unzip to unzip the file into session storage:

!wget https://github.com/lukyfox/Datafiles/raw/master/digits/digits.zip
!unzip /content/digits.zip

You can also rename the file after download or specify folder name for unzipped data.

You may notice that the dowloadable address differs from zip file address just a little. In fact should be enough to replace blob with raw to get the right address for any zip file.

Answered By: Lukas