Fetching data from BigQuery taking very long

Question:

I am trying to fetch data from BigQuery. Everything is working fine when i fetch small data but when i try to fetch big data then its taking forever. any efficient way?

So far i am using this:

import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'
import google.auth
from google.cloud import bigquery

%load_ext google.cloud.bigquery

import google.datalab.bigquery as bq
from google.cloud.bigquery import Client

client = bigquery.Client()

Here is my SQL command:

sql = """
   SELECT bla, bla1, bla2
FROM table
"""
df = client.query(sql)
df.to_dataframe()
Asked By: s_khan92

||

Answers:

You can get BigQuery data into a dataframe magnitudes faster by changing the method.

Check how these options are reflected in the chart:

  • A: to_dataframe() – Uses BigQuery tabledata.list API.
  • B: to_dataframe(bqstorage_client=bqstorage_client), package version 1.16.0 – Uses BigQuery Storage API with Avro data format.
  • C: to_dataframe(bqstorage_client=bqstorage_client), package version 1.17.0 – Uses BigQuery Storage API with Arrow data format.
  • D: to_arrow(bqstorage_client=bqstorage_client).to_pandas(), package version 1.17.0 – Uses BigQuery Storage API with Arrow data format.

enter image description here

Note how you can go from >500 seconds to ~20 by using to_arrow(bqstorage_client=bqstorage_client).to_pandas().

See https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171

Answered By: Felipe Hoffa

Try using following method it works like Magic,

%%bigquery

SELECT * FROM table.name

For more detailed explanation click here >> https://cloud.google.com/bigquery/docs/visualize-jupyter

Answered By: AshwinSG