How to resume download using MediaIoBaseDownload with Google Drive and Python?

Question:

With large files I get various errors that stops the download, so I want to resume from where it stopped by appending to the file on disk properly.

I saw that the FileIO has to be using ‘ab’ mode:

fh = io.FileIO(fname, mode='ab')

but I couldn’t find how to specify where to continue from using MediaIoBaseDownload.

Any idea on how to implement this?

Asked By: Joan Venge

||

Answers:

I cannot see your code, so I’ll provide you some general information on some options that can help you solve the issue. You can implement downloading the file in Chunks using MediaIoBaseDownload, you can see some documentation about this here.

Example:

  request = farms.animals().get_media(id='cow')
  fh = io.FileIO('cow.png', mode='wb')
  downloader = MediaIoBaseDownload(fh, request, chunksize=1024*1024)

  done = False
  while done is False:
    status, done = downloader.next_chunk()
    if status:
      print "Download %d%%." % int(status.progress() * 100)
  print "Download Complete!"

Get the next chunk of the download.

Args: num_retries: Integer, number of times to retry with randomized
exponential backoff. If all retries fail, the raised HttpError
represents the last request. If zero (default), we attempt the
request only once.

Returns: (status, done): (MediaDownloadProgress, boolean)
The value of ‘done’ will be True when the media has been fully
downloaded or the total size of the media is unknown.

Raises: googleapiclient.errors.HttpError if the response was not a
2xx. httplib2.HttpLib2Error if a transport error has occurred.

I also found this example in the Google documentation here.

from __future__ import print_function

import io

import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload


def download_file(real_file_id):
    """Downloads a file
    Args:
        real_file_id: ID of the file to download
    Returns : IO object with location.

    Load pre-authorized user credentials from the environment.
    TODO(developer) - See https://developers.google.com/identity
    for guides on implementing OAuth2 for the application.
    """
    creds, _ = google.auth.default()

    try:
        # create drive api client
        service = build('drive', 'v3', credentials=creds)

        file_id = real_file_id

        # pylint: disable=maybe-no-member
        request = service.files().get_media(fileId=file_id)
        file = io.BytesIO()
        downloader = MediaIoBaseDownload(file, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print(F'Download {int(status.progress() * 100)}.')

    except HttpError as error:
        print(F'An error occurred: {error}')
        file = None

    return file.getvalue()


if __name__ == '__main__':
    download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')

Lastly, you can review several examples on how to use MediaIoBaseDownload with chunks in these 2 blogs.

  1. Python googleapiclient.http.MediaIoBaseDownload() Examples
  2. googleapiclient.http.MediaIoBaseDownload

Update

Partial download functionality is provided by many client libraries via a Media Download service. You can refer to the client library documentation for details here and here. However, the documentation is not very clear.

The API client library for Java has more information an states that:

"The resumable media download protocol is similar to the resumable media upload protocol, which is described in the Google Drive API documentation."

In the Google Drive API documentation you will find some examples using python for resumable upload. You can use the documentation of the Python google-resumable-media library, the Java resumable media download, and the resumable upload as base for the code to restart the upload once it fails.

Answered By: Giselle Valladares

When I saw your question, I thought that this thread might be useful. Ref I have posted my answer to this thread.

In order to achieve the partial download from Google Drive, the property of Range: bytes=500-999 is required to be included in the request header. But, unfortunately, in the current stage, MediaIoBaseDownload cannot use this property. When MediaIoBaseDownload is used, all data is downloaded.

So, in order to achieve your goal, it is required to use a workaround. In this workaround, I proposed the following flow.

  1. Retrieve the filename and file size of the file on the Google Drive you want to download.
  2. Check the existing file by filename.
    • When there is no existing file, the file is downloaded as a new file.
    • When there is an existing file, the file is downloaded as a resumable download.
  3. Download the file content by requests.

When this flow is reflected in a sample script of python, it becomes as follows.

Sample script:

service = build("drive", "v3", credentials=creds) # Here, please use your client.
file_id = "###" # Please set the file ID of the file you want to download.

access_token = creds.token # Acces token is retrieved from creds of service = build("drive", "v3", credentials=creds)

# Get the filename and file size.
obj = service.files().get(fileId=file_id, fields="name,size").execute()
filename = obj.get("name", "sampleName")
size = obj.get("size", None)
if not size:
    sys.exit("No file size.")
else:
    size = int(size)

# Check existing file.
file_path = os.path.join("./", filename) # Please set your path.
o = {}
if os.path.exists(file_path):
    o["start_byte"] = os.path.getsize(file_path)
    o["mode"] = "ab"
    o["download"] = "As resume"
else:
    o["start_byte"] = 0
    o["mode"] = "wb"
    o["download"] = "As a new file"
if o["start_byte"] == size:
    sys.exit("The download of this file has already been finished.")

# Download process
print(o["download"])
headers = {
    "Authorization": f"Bearer {access_token}",
    "Range": f'bytes={o["start_byte"]}-',
}
url = f"https://www.googleapis.com/drive/v3/files/{file_id}?alt=media"
with requests.get(url, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(file_path, o["mode"]) as f:
        for chunk in r.iter_content(chunk_size=10240):
            f.write(chunk)
  • When this script is run, a file of file_id is downloaded. When the downloaded is stopped in the middle of downloading, when you run the script again, the download is run as the resume. By this, the file content is appended to the existing file. I thought that this might be your expected situation.

  • In this script, please load the following modules. And also, please load the required modules for retrieving service = build("drive", "v3", credentials=creds).

    import os.path
    import requests
    import sys
    

Note:

  • In this case, it supposes that the download file is not Google Docs files (Document, Spreadsheet, Slides, and so on). Please be careful about this.

  • This script supposes that your client service = build("drive", "v3", credentials=creds) can be used for downloading the file from Google Drive. Please be careful about this.

References:

Answered By: Tanaike