How to download a Google Docs excel sheet with a Gspread and access data locally (A1 notation)?
Question:
I need to download an excel sheet from Google Docs via Gspread and then multiple times I’ll need to read the values of different cells in ‘A1’ notation. Thus, I can’t just get the spreadsheet and then call val = worksheet.acell('B1').value
, because the script will freeze out of too many API calls. My solution for now:
def download_hd_sheet():
worksheet = gc.values().get(spreadsheetId=excel_id, range='variables', valueRenderOption='FORMULA').execute()['values']
df = pd.DataFrame(worksheet)
writer = pd.ExcelWriter("Variables.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False, header=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
writer.save()
book = openpyxl.load_workbook('Variables.xlsx', data_only=False)
global hd_sheet
hd_sheet = book.active
So far what I’m doing is:
- I download the values from the worksheet.
- Transform it (list of lists) into a pandas dataframe.
- Then I write the df to a .xlsx file.
- I read the .xlsx file to a global variable
It seems to me that I am doing so many things just to achieve something that can be done in two lines. Please, let me know what would be more effective than the above.
Answers:
I believe your goal as follows.
- You want to download the Google Spreadsheet as the XLSX data.
- You want to use the downloaded XLSX data without saving as the file.
- You have already been able to get and put values for Google Spreadsheet using gspread.
- You want to achieve this using python.
In order to achieve your goal, I would like to propose the following flow.
- Download the Google Spreadsheet as the XLSX data using the method of Files: export in Drive API.
- Open the XLSX data using the downloaded binary data with
openpyxl.load_workbook()
.
Sample script:
In this sample script, from your situation, the access token is used from the authorization for gspread.
spreadsheetId = "###" # Please set the Spreadsheet ID.
client = gspread.authorize(credentials)
access_token = client.auth.token
url = "https://www.googleapis.com/drive/v3/files/" + spreadsheetId + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
book = openpyxl.load_workbook(filename=BytesIO(res.content), data_only=False)
hd_sheet = book.active
-
By above script, the XLSX data is directly downloaded from Google Spreadsheet and openpyxl.load_workbook
-
In this case, the following libraries in addition to gspread are used.
import openpyxl
import requests
from io import BytesIO
Note:
- In this case, please include the scope of
https://www.googleapis.com/auth/drive
or https://www.googleapis.com/auth/drive.readonly
. When you modified the scopes, please reauthorize the scopes. By this, the new scopes are reflected to the access token. So please be careful this.
References:
- Files: export
- Using openpyxl to read file from memory
- I think that this thread might be useful for your situation.
I need to download an excel sheet from Google Docs via Gspread and then multiple times I’ll need to read the values of different cells in ‘A1’ notation. Thus, I can’t just get the spreadsheet and then call val = worksheet.acell('B1').value
, because the script will freeze out of too many API calls. My solution for now:
def download_hd_sheet():
worksheet = gc.values().get(spreadsheetId=excel_id, range='variables', valueRenderOption='FORMULA').execute()['values']
df = pd.DataFrame(worksheet)
writer = pd.ExcelWriter("Variables.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False, header=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
writer.save()
book = openpyxl.load_workbook('Variables.xlsx', data_only=False)
global hd_sheet
hd_sheet = book.active
So far what I’m doing is:
- I download the values from the worksheet.
- Transform it (list of lists) into a pandas dataframe.
- Then I write the df to a .xlsx file.
- I read the .xlsx file to a global variable
It seems to me that I am doing so many things just to achieve something that can be done in two lines. Please, let me know what would be more effective than the above.
I believe your goal as follows.
- You want to download the Google Spreadsheet as the XLSX data.
- You want to use the downloaded XLSX data without saving as the file.
- You have already been able to get and put values for Google Spreadsheet using gspread.
- You want to achieve this using python.
In order to achieve your goal, I would like to propose the following flow.
- Download the Google Spreadsheet as the XLSX data using the method of Files: export in Drive API.
- Open the XLSX data using the downloaded binary data with
openpyxl.load_workbook()
.
Sample script:
In this sample script, from your situation, the access token is used from the authorization for gspread.
spreadsheetId = "###" # Please set the Spreadsheet ID.
client = gspread.authorize(credentials)
access_token = client.auth.token
url = "https://www.googleapis.com/drive/v3/files/" + spreadsheetId + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
book = openpyxl.load_workbook(filename=BytesIO(res.content), data_only=False)
hd_sheet = book.active
-
By above script, the XLSX data is directly downloaded from Google Spreadsheet and
openpyxl.load_workbook
-
In this case, the following libraries in addition to gspread are used.
import openpyxl import requests from io import BytesIO
Note:
- In this case, please include the scope of
https://www.googleapis.com/auth/drive
orhttps://www.googleapis.com/auth/drive.readonly
. When you modified the scopes, please reauthorize the scopes. By this, the new scopes are reflected to the access token. So please be careful this.
References:
- Files: export
- Using openpyxl to read file from memory
- I think that this thread might be useful for your situation.