Google BigQuery query in Python works when using result(), but Permission issue when using to_dataframe()

Question:

I’ve run into a problem after upgrades of my pip packages and my bigquery connector that returns query results suddenly stopped working with following error message

from google.cloud import bigquery
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file('path/to/file', scopes=['https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/bigquery'
])

client = bigquery.Client(credentials=credentials)
data = client.query('select * from dataset.table').to_dataframe()

PermissionDenied: 403 request failed: the user does not have
bigquery.readsessions.create’ permission

But! If you switched the code to

data = client.query('select * from dataset.table').result()

(dataframe -> result) you received the data in RowIterator format and were able to properly read them.

The same script using to_dataframe with the same credentials was working on the server. Therefore I set my bigquery package to the same version 2.28.0, which still did not help.

I could not find any advices on this error / topic anywhere, so I just want to share if any of you faced the same thing.

Asked By: Michaela

||

Answers:

Resolution

Along with google-cloud-bigquery package, I also had installed package google-cloud-bigquery-storage. Once I uninstalled that one using

pip uninstall google-cloud-bigquery-storage

everything started working again! Unfortunately, the error message was not so straightforward so it took some time to figure out 🙂

Answered By: Michaela

There are different ways of receiving data from of bigquery. Using the BQ Storage API is considered more efficient for larger result sets compared to the other options:

The BigQuery Storage Read API provides a third option that represents an improvement over prior options. When you use the Storage Read API, structured data is sent over the wire in a binary serialization format. This allows for additional parallelism among multiple consumers for a set of results

The Python BQ library internally determines whether it can use the BQ Storage API or not. For the result method, it uses the tradtional tabledata.list method internally, whereas the to_dataframe method uses the BQ Storage API if the according package is installed.

However, using the BQ Storage API requires you to have the bigquery.readSessionUser Role respectively the readsessions.create right which in your case seems to be lacking.

By uninstalling the google-cloud-bigquery-storage, the google-cloud-bigquery package was falling back to the list method. Hence, by de-installing this package, you were working around the lack of rights.

See the BQ Python Libary Documentation for details.

Answered By: Thomas W.

Just set
create_bqstorage_client=False

from google.cloud import bigquery
import os
client = bigquery.Client()
query_job = client.query(query)
df = query_job.result().to_dataframe(create_bqstorage_client=False)
Answered By: Sayantan Banerjee