Fetching data from BigQuery taking very long
Question:
I am trying to fetch data from BigQuery. Everything is working fine when i fetch small data but when i try to fetch big data then its taking forever. any efficient way?
So far i am using this:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'
import google.auth
from google.cloud import bigquery
%load_ext google.cloud.bigquery
import google.datalab.bigquery as bq
from google.cloud.bigquery import Client
client = bigquery.Client()
Here is my SQL command:
sql = """
SELECT bla, bla1, bla2
FROM table
"""
df = client.query(sql)
df.to_dataframe()
Answers:
You can get BigQuery data into a dataframe magnitudes faster by changing the method.
Check how these options are reflected in the chart:
- A:
to_dataframe()
– Uses BigQuery tabledata.list API.
- B:
to_dataframe(bqstorage_client=bqstorage_client)
, package version 1.16.0 – Uses BigQuery Storage API with Avro data format.
- C:
to_dataframe(bqstorage_client=bqstorage_client)
, package version 1.17.0 – Uses BigQuery Storage API with Arrow data format.
- D:
to_arrow(bqstorage_client=bqstorage_client).to_pandas()
, package version 1.17.0 – Uses BigQuery Storage API with Arrow data format.
Note how you can go from >500 seconds to ~20 by using to_arrow(bqstorage_client=bqstorage_client).to_pandas()
.
See https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171
Try using following method it works like Magic,
%%bigquery
SELECT * FROM table.name
For more detailed explanation click here >> https://cloud.google.com/bigquery/docs/visualize-jupyter
I am trying to fetch data from BigQuery. Everything is working fine when i fetch small data but when i try to fetch big data then its taking forever. any efficient way?
So far i am using this:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'cred.json'
import google.auth
from google.cloud import bigquery
%load_ext google.cloud.bigquery
import google.datalab.bigquery as bq
from google.cloud.bigquery import Client
client = bigquery.Client()
Here is my SQL command:
sql = """
SELECT bla, bla1, bla2
FROM table
"""
df = client.query(sql)
df.to_dataframe()
You can get BigQuery data into a dataframe magnitudes faster by changing the method.
Check how these options are reflected in the chart:
- A:
to_dataframe()
– Uses BigQuery tabledata.list API. - B:
to_dataframe(bqstorage_client=bqstorage_client)
, package version 1.16.0 – Uses BigQuery Storage API with Avro data format. - C:
to_dataframe(bqstorage_client=bqstorage_client)
, package version 1.17.0 – Uses BigQuery Storage API with Arrow data format. - D:
to_arrow(bqstorage_client=bqstorage_client).to_pandas()
, package version 1.17.0 – Uses BigQuery Storage API with Arrow data format.
Note how you can go from >500 seconds to ~20 by using to_arrow(bqstorage_client=bqstorage_client).to_pandas()
.
See https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171
Try using following method it works like Magic,
%%bigquery
SELECT * FROM table.name
For more detailed explanation click here >> https://cloud.google.com/bigquery/docs/visualize-jupyter