How to resume download using MediaIoBaseDownload with Google Drive and Python?
Question:
With large files I get various errors that stops the download, so I want to resume from where it stopped by appending to the file on disk properly.
I saw that the FileIO has to be using ‘ab’ mode:
fh = io.FileIO(fname, mode='ab')
but I couldn’t find how to specify where to continue from using MediaIoBaseDownload.
Any idea on how to implement this?
Answers:
I cannot see your code, so I’ll provide you some general information on some options that can help you solve the issue. You can implement downloading the file in Chunks using MediaIoBaseDownload
, you can see some documentation about this here.
Example:
request = farms.animals().get_media(id='cow')
fh = io.FileIO('cow.png', mode='wb')
downloader = MediaIoBaseDownload(fh, request, chunksize=1024*1024)
done = False
while done is False:
status, done = downloader.next_chunk()
if status:
print "Download %d%%." % int(status.progress() * 100)
print "Download Complete!"
Get the next chunk of the download.
Args: num_retries: Integer, number of times to retry with randomized
exponential backoff. If all retries fail, the raised HttpError
represents the last request. If zero (default), we attempt the
request only once.
Returns: (status, done): (MediaDownloadProgress, boolean)
The value of ‘done’ will be True when the media has been fully
downloaded or the total size of the media is unknown.
Raises: googleapiclient.errors.HttpError if the response was not a
2xx. httplib2.HttpLib2Error if a transport error has occurred.
I also found this example in the Google documentation here.
from __future__ import print_function
import io
import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
def download_file(real_file_id):
"""Downloads a file
Args:
real_file_id: ID of the file to download
Returns : IO object with location.
Load pre-authorized user credentials from the environment.
TODO(developer) - See https://developers.google.com/identity
for guides on implementing OAuth2 for the application.
"""
creds, _ = google.auth.default()
try:
# create drive api client
service = build('drive', 'v3', credentials=creds)
file_id = real_file_id
# pylint: disable=maybe-no-member
request = service.files().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')
except HttpError as error:
print(F'An error occurred: {error}')
file = None
return file.getvalue()
if __name__ == '__main__':
download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')
Lastly, you can review several examples on how to use MediaIoBaseDownload
with chunks in these 2 blogs.
Update
Partial download functionality is provided by many client libraries via a Media Download service. You can refer to the client library documentation for details here and here. However, the documentation is not very clear.
The API client library for Java has more information an states that:
"The resumable media download protocol is similar to the resumable media upload protocol, which is described in the Google Drive API documentation."
In the Google Drive API documentation you will find some examples using python for resumable upload. You can use the documentation of the Python google-resumable-media library, the Java resumable media download, and the resumable upload as base for the code to restart the upload once it fails.
When I saw your question, I thought that this thread might be useful. Ref I have posted my answer to this thread.
In order to achieve the partial download from Google Drive, the property of Range: bytes=500-999
is required to be included in the request header. But, unfortunately, in the current stage, MediaIoBaseDownload
cannot use this property. When MediaIoBaseDownload
is used, all data is downloaded.
So, in order to achieve your goal, it is required to use a workaround. In this workaround, I proposed the following flow.
- Retrieve the filename and file size of the file on the Google Drive you want to download.
- Check the existing file by filename.
- When there is no existing file, the file is downloaded as a new file.
- When there is an existing file, the file is downloaded as a resumable download.
- Download the file content by
requests
.
When this flow is reflected in a sample script of python, it becomes as follows.
Sample script:
service = build("drive", "v3", credentials=creds) # Here, please use your client.
file_id = "###" # Please set the file ID of the file you want to download.
access_token = creds.token # Acces token is retrieved from creds of service = build("drive", "v3", credentials=creds)
# Get the filename and file size.
obj = service.files().get(fileId=file_id, fields="name,size").execute()
filename = obj.get("name", "sampleName")
size = obj.get("size", None)
if not size:
sys.exit("No file size.")
else:
size = int(size)
# Check existing file.
file_path = os.path.join("./", filename) # Please set your path.
o = {}
if os.path.exists(file_path):
o["start_byte"] = os.path.getsize(file_path)
o["mode"] = "ab"
o["download"] = "As resume"
else:
o["start_byte"] = 0
o["mode"] = "wb"
o["download"] = "As a new file"
if o["start_byte"] == size:
sys.exit("The download of this file has already been finished.")
# Download process
print(o["download"])
headers = {
"Authorization": f"Bearer {access_token}",
"Range": f'bytes={o["start_byte"]}-',
}
url = f"https://www.googleapis.com/drive/v3/files/{file_id}?alt=media"
with requests.get(url, headers=headers, stream=True) as r:
r.raise_for_status()
with open(file_path, o["mode"]) as f:
for chunk in r.iter_content(chunk_size=10240):
f.write(chunk)
-
When this script is run, a file of file_id
is downloaded. When the downloaded is stopped in the middle of downloading, when you run the script again, the download is run as the resume. By this, the file content is appended to the existing file. I thought that this might be your expected situation.
-
In this script, please load the following modules. And also, please load the required modules for retrieving service = build("drive", "v3", credentials=creds)
.
import os.path
import requests
import sys
Note:
-
In this case, it supposes that the download file is not Google Docs files (Document, Spreadsheet, Slides, and so on). Please be careful about this.
-
This script supposes that your client service = build("drive", "v3", credentials=creds)
can be used for downloading the file from Google Drive. Please be careful about this.
References:
- Related thread.
- Partial download
With large files I get various errors that stops the download, so I want to resume from where it stopped by appending to the file on disk properly.
I saw that the FileIO has to be using ‘ab’ mode:
fh = io.FileIO(fname, mode='ab')
but I couldn’t find how to specify where to continue from using MediaIoBaseDownload.
Any idea on how to implement this?
I cannot see your code, so I’ll provide you some general information on some options that can help you solve the issue. You can implement downloading the file in Chunks using MediaIoBaseDownload
, you can see some documentation about this here.
Example:
request = farms.animals().get_media(id='cow')
fh = io.FileIO('cow.png', mode='wb')
downloader = MediaIoBaseDownload(fh, request, chunksize=1024*1024)
done = False
while done is False:
status, done = downloader.next_chunk()
if status:
print "Download %d%%." % int(status.progress() * 100)
print "Download Complete!"
Get the next chunk of the download.
Args: num_retries: Integer, number of times to retry with randomized
exponential backoff. If all retries fail, the raised HttpError
represents the last request. If zero (default), we attempt the
request only once.Returns: (status, done): (MediaDownloadProgress, boolean)
The value of ‘done’ will be True when the media has been fully
downloaded or the total size of the media is unknown.Raises: googleapiclient.errors.HttpError if the response was not a
2xx. httplib2.HttpLib2Error if a transport error has occurred.
I also found this example in the Google documentation here.
from __future__ import print_function
import io
import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
def download_file(real_file_id):
"""Downloads a file
Args:
real_file_id: ID of the file to download
Returns : IO object with location.
Load pre-authorized user credentials from the environment.
TODO(developer) - See https://developers.google.com/identity
for guides on implementing OAuth2 for the application.
"""
creds, _ = google.auth.default()
try:
# create drive api client
service = build('drive', 'v3', credentials=creds)
file_id = real_file_id
# pylint: disable=maybe-no-member
request = service.files().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')
except HttpError as error:
print(F'An error occurred: {error}')
file = None
return file.getvalue()
if __name__ == '__main__':
download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')
Lastly, you can review several examples on how to use MediaIoBaseDownload
with chunks in these 2 blogs.
Update
Partial download functionality is provided by many client libraries via a Media Download service. You can refer to the client library documentation for details here and here. However, the documentation is not very clear.
The API client library for Java has more information an states that:
"The resumable media download protocol is similar to the resumable media upload protocol, which is described in the Google Drive API documentation."
In the Google Drive API documentation you will find some examples using python for resumable upload. You can use the documentation of the Python google-resumable-media library, the Java resumable media download, and the resumable upload as base for the code to restart the upload once it fails.
When I saw your question, I thought that this thread might be useful. Ref I have posted my answer to this thread.
In order to achieve the partial download from Google Drive, the property of Range: bytes=500-999
is required to be included in the request header. But, unfortunately, in the current stage, MediaIoBaseDownload
cannot use this property. When MediaIoBaseDownload
is used, all data is downloaded.
So, in order to achieve your goal, it is required to use a workaround. In this workaround, I proposed the following flow.
- Retrieve the filename and file size of the file on the Google Drive you want to download.
- Check the existing file by filename.
- When there is no existing file, the file is downloaded as a new file.
- When there is an existing file, the file is downloaded as a resumable download.
- Download the file content by
requests
.
When this flow is reflected in a sample script of python, it becomes as follows.
Sample script:
service = build("drive", "v3", credentials=creds) # Here, please use your client.
file_id = "###" # Please set the file ID of the file you want to download.
access_token = creds.token # Acces token is retrieved from creds of service = build("drive", "v3", credentials=creds)
# Get the filename and file size.
obj = service.files().get(fileId=file_id, fields="name,size").execute()
filename = obj.get("name", "sampleName")
size = obj.get("size", None)
if not size:
sys.exit("No file size.")
else:
size = int(size)
# Check existing file.
file_path = os.path.join("./", filename) # Please set your path.
o = {}
if os.path.exists(file_path):
o["start_byte"] = os.path.getsize(file_path)
o["mode"] = "ab"
o["download"] = "As resume"
else:
o["start_byte"] = 0
o["mode"] = "wb"
o["download"] = "As a new file"
if o["start_byte"] == size:
sys.exit("The download of this file has already been finished.")
# Download process
print(o["download"])
headers = {
"Authorization": f"Bearer {access_token}",
"Range": f'bytes={o["start_byte"]}-',
}
url = f"https://www.googleapis.com/drive/v3/files/{file_id}?alt=media"
with requests.get(url, headers=headers, stream=True) as r:
r.raise_for_status()
with open(file_path, o["mode"]) as f:
for chunk in r.iter_content(chunk_size=10240):
f.write(chunk)
-
When this script is run, a file of
file_id
is downloaded. When the downloaded is stopped in the middle of downloading, when you run the script again, the download is run as the resume. By this, the file content is appended to the existing file. I thought that this might be your expected situation. -
In this script, please load the following modules. And also, please load the required modules for retrieving
service = build("drive", "v3", credentials=creds)
.import os.path import requests import sys
Note:
-
In this case, it supposes that the download file is not Google Docs files (Document, Spreadsheet, Slides, and so on). Please be careful about this.
-
This script supposes that your client
service = build("drive", "v3", credentials=creds)
can be used for downloading the file from Google Drive. Please be careful about this.
References:
- Related thread.
- Partial download