Export BigQuery Data to CSV without using Google Cloud Storage

Question

I am currently writing a software, to export large amounts of BigQuery data and store the queried results locally as CSV files. I used Python 3 and the client provided by google. I did configuration and authentification, but the problem is, that i can’t store the data locally. Everytime i execute, i get following error message:

googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/round-office-769/jobs?alt=json returned “Invalid extract destination URI ‘response/file-name-*.csv’. Must be a valid Google Storage path.”>

This is my Job Configuration:

def export_table(service, cloud_storage_path,
             projectId, datasetId, tableId, sqlQuery,
             export_format="CSV",
             num_retries=5):

# Generate a unique job_id so retries
# don't accidentally duplicate export
job_data = {
    'jobReference': {
        'projectId': projectId,
        'jobId': str(uuid.uuid4())
    },
    'configuration': {
        'extract': {
            'sourceTable': {
                'projectId': projectId,
                'datasetId': datasetId,
                'tableId': tableId,
            },
            'destinationUris': ['response/file-name-*.csv'],
            'destinationFormat': export_format
        },
        'query': {
            'query': sqlQuery,
        }
    }
}
return service.jobs().insert(
    projectId=projectId,
    body=job_data).execute(num_retries=num_retries)

I hoped i could just use a local path instead of a cloud storage, to store data, but i was wrong.

So my Question is:

Can i download the queried data locally(or to a local database) or do i have to use Google Cloud Storage?

Asked By: mkjoerg

||

Source

Answer 1

You need to use Google Cloud Storage for your export job. Exporting data from BigQuery is explained here, check also the variants for different path syntaxes.

Then you can download the files from GCS to your local storage.

Gsutil tool can help you further to download the file from GCS to local machine.

You cannot download with one move locally, you first need to export to GCS, than to transfer to local machine.

Answered By: Pentium10

Answer 2

You can run a tabledata.list() operation on that table and set “alt=csv” which will return the beginning of the table as CSV.

Answered By: yiru

Answer 3

You can download all data directly (without routing it through Google Cloud Storage) using paging mechanism. Basically you need to generate a page token for each page, download the data in the page and iterate this until all data has been downloaded i.e. no more tokens are available. Here is an example code in Java, which hopefully clarifies the idea:

import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.BigqueryScopes;
import com.google.api.client.util.Data;
import com.google.api.services.bigquery.model.*;

/* your class starts here */

private String projectId = ""; /* fill in the project id here */
private String query = ""; /* enter your query here */
private Bigquery bigQuery;
private Job insert;
private TableDataList tableDataList;
private Iterator<TableRow> rowsIterator;
private List<TableRow> rows;
private long maxResults = 100000L; /* max number of rows in a page */

/* run query */
public void open() throws Exception {
    HttpTransport transport = GoogleNetHttpTransport.newTrustedTransport();
    JsonFactory jsonFactory = new JacksonFactory();
    GoogleCredential credential = GoogleCredential.getApplicationDefault(transport, jsonFactory);
    if (credential.createScopedRequired())
        credential = credential.createScoped(BigqueryScopes.all());
    bigQuery = new Bigquery.Builder(transport, jsonFactory, credential).setApplicationName("my app").build();

    JobConfigurationQuery queryConfig = new JobConfigurationQuery().setQuery(query);
    JobConfiguration jobConfig = new JobConfiguration().setQuery(queryConfig);
    Job job = new Job().setConfiguration(jobConfig);
    insert = bigQuery.jobs().insert(projectId, job).execute();
    JobReference jobReference = insert.getJobReference();

    while (true) {
        Job poll = bigQuery.jobs().get(projectId, jobReference.getJobId()).execute();
        String state = poll.getStatus().getState();
        if ("DONE".equals(state)) {
            ErrorProto errorResult = poll.getStatus().getErrorResult();
            if (errorResult != null)
                throw new Exception("Error running job: " + poll.getStatus().getErrors().get(0));
            break;
        }
        Thread.sleep(10000);
    }

    tableDataList = getPage();
    rows = tableDataList.getRows();
    rowsIterator = rows != null ? rows.iterator() : null;
}

/* read data row by row */
public /* your data object here */ read() throws Exception {
    if (rowsIterator == null) return null;

    if (!rowsIterator.hasNext()) {
        String pageToken = tableDataList.getPageToken();
        if (pageToken == null) return null;
        tableDataList = getPage(pageToken);
        rows = tableDataList.getRows();
        if (rows == null) return null;
        rowsIterator = rows.iterator();
    }

    TableRow row = rowsIterator.next();
    for (TableCell cell : row.getF()) {
        Object value = cell.getV();
        /* extract the data here */
    }

    /* return the data */
}

private TableDataList getPage() throws IOException {
    return getPage(null);
}

private TableDataList getPage(String pageToken) throws IOException {
    TableReference sourceTable = insert
            .getConfiguration()
            .getQuery()
            .getDestinationTable();
    if (sourceTable == null)
        throw new IllegalArgumentException("Source table not available. Please check the query syntax.");
    return bigQuery.tabledata()
            .list(projectId, sourceTable.getDatasetId(), sourceTable.getTableId())
            .setPageToken(pageToken)
            .setMaxResults(maxResults)
            .execute();
}

Answered By: Jussi Jousimo

Answer 4

Another way to do this is from the UI, once the query results have returned you can select the “Download as CSV” button.

Answered By: Victor Mota

Answer 5

If you install the Google BigQuery API and pandas and pandas.io, you can run Python inside a Jupyter notebook, query the BQ Table, and get the data into a local dataframe. From there, you can write it out to CSV.

Answered By: Mark Ginsburg

Answer 6

As Mikhail Berlyant said,

BigQuery does not provide ability to directly export/download query
result to GCS or Local File.

You can still export it using the Web UI in just three steps

Configure query to save the results in a BigQuery table and run it.
Export the table to a bucket in GCS.
Download from the bucket.

To make sure costs stay low, just make sure you delete the table once you exported the content to GCS and delete the content from the bucket and the bucket once you downloaded the file(s) to your machine.

Step 1

When in BigQuery screen, before running the query go to More > Query Settings

This opens the following

Here you want to have

Destination: Set a destination table for query results
Project name: select the project.
Dataset name: select a dataset. If you don’t have one, create it and come back.
Table name: give whatever name you want (must contain only letters, numbers, or underscores).
Result size: Allow large results (no size limit).

Then Save it and the Query is configured to be saved in a specific table. Now you can run the Query.

Step 2

To export it to GCP you have to go to the table and click EXPORT > Export to GCS.

This opens the following screen

In Select GCS location you define the bucket, the folder and the file.

For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)

daria_bucket/test.csv

If the file is too big (more than 1 GB), you’ll get an error. To fix it, you’ll have to save it in more files using wildcard. So, you’ll need to add *, just like that

daria_bucket/test*.csv

This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, … testX.

Step 3

Then go to Storage and you’ll see the bucket.

Go inside of it and you’ll find the one (or more) file(s). You can then download from there.

Answered By: Tiago Martins Peres

Answer 7

Data export from BigQuery table to CSV file using Python pandas:

import pandas as pd
from google.cloud import bigquery

selectQuery = """SELECT * FROM dataset-name.table-name"""
bigqueryClient = bigquery.Client()
df = bigqueryClient.query(selectQuery).to_dataframe()
df.to_csv("file-name.csv", index=False)

Answered By: Soumendra Mishra

Answer 8

Maybe you can use the simba odbc driver provided by Google and use any tool that provides odbc connection for creating the csv. It can be even microsoft ssis and you don’t even need to code.

Answered By: Cristian Avendaño

Export BigQuery Data to CSV without using Google Cloud Storage

Question:

Answers:

Step 1

Step 2

Step 3