How import logs from gcp projet?

Question:

I read some documentation on internet official and non official and i’m currently unable to import the logs from bigquery like "bigquery_resource" (for getting all my insert, update, merge … processing on my gcp project ) from a gcp project where i’m owner with python on my local.

Mandatory prerequisite :

  • Only use the scripts to read and catch the logs with a filter without creating CF, data in bucket, manual action from user on the gcp project etc…
  • Using a service account in the process
  • Import the bigquery logs from the gcp on a local when i execute my script python

Here the code below where i try to get the logs :

import google.protobuf
from google.cloud.bigquery_logging_v1 import AuditData
import google.cloud.logging
from datetime import datetime, timedelta, timezone
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:\mypath\credentials.json"

project_id = os.environ["GOOGLE_CLOUD_PROJECT"] = "project1"



yesterday = datetime.now(timezone.utc) - timedelta(days=2)
time_format = "%Y-%m-%dT%H:%M:%S.%f%z"

filter_str = (
    f'logName="projects/{project_id}/logs/cloudaudit.googleapis.com%2Factivity"'
    f' AND resource.type="bigquery_resource"'
    f' AND timestamp>="{yesterday.strftime(time_format)}"'
)


client = google.cloud.logging.Client(project="project1")


for entry in client.list_entries(filter_=filter_str):
    decoded_entry = entry.to_api_repr()
    #print(decoded_entry)
    print(entry) #the same output as print(decoded_entry)



open("C:\mypath\logs.txt", "w").close()
with open("C:\mypath\logs.txt", "w") as f:
    for entry in client.list_entries(filter_=filter_str):

        f.write(entry)
   

Unfortunately , it doesn’t work(and my code is messy), i get a ProtobufEntry with the var entry like below and i don’t know how get my data from my gcp project in a proper way.

My output

All the help is welcome ! (please don’t answer me with a deprecated answer from openaichatgpt )

Asked By: Cass

||

Answers:

One way to achieve this as follows:

Create a dedicated logging sink for BigQuery logs:

gcloud logging sinks create my-example-sink bigquery.googleapis.com/projects/my-project-id/datasets/auditlog_dataset 
    --log-filter='protoPayload.metadata."@type"="type.googleapis.com/google.cloud.audit.BigQueryAuditMetadata"'

The above command will create logging sink in a dataset named auditlog_dataset that only includes BigQueryAuditMetadata messages. Refer BigQueryAuditMetadata for all the events which are captured as part of GCP AuditData.

Create a service account and give access to above created dataset.

For creating service account refer here and for granting access to dataset refer here.

Use this service account to authenticate from your local environment and query the above created dataset using BigQuery Python client to get filtered BigQuery data.

from google.cloud import bigquery

client = bigquery.Client()

# Select rows from log dataset
QUERY = (
    'SELECT name FROM `MYPROJECTID.MYDATASETID.cloudaudit_googleapis_com_activity`'
    'LIMIT 100')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print(row.name)

Also, you can query the audit tables from the console directly.

Reference BigQuery audit logging.

Another option is to use Python Script to query log events. And one more option is to use Cloud Pub/Sub to route logs to external (out of gcp) clients.

I mostly prefer to keep the filtered logs in dedicated Log Analytics bucket and query as per needs and create custom log based metrics using Cloud Monitoring. Moving logs out of GCP may incur network egress charges, refer the documentation, if you are querying large volume of data.

Answered By: Rathish Kumar B

Here how i export my logs without creating bucket, sink, pubsub, cloud function, table in bigquery etc..

=> Only 1 Service account with rights on my project and 1 script .py on my local and added an option in the python script for scan only bigquery ressource during the last hour.

I add the path of gcloud because i have some problem with path in my envvar in my local with the popen lib, maybe you won’t need to do it.

from subprocess import Popen, PIPE
import json

from google.cloud.bigquery_logging_v1 import AuditData
import google.cloud.logging
from datetime import datetime, timedelta, timezone
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:\Users\USERAAAA\Documents\Python Scripts\credentials.json"

gcloud_path = "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin\gcloud.cmd"
process = Popen([gcloud_path, "logging", "read", "resource.type=bigquery_resource AND logName=projects/PROJECTGCP1/logs/cloudaudit.googleapis.com%2Fdata_access", "--freshness=1h"], stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()
output_str = stdout.decode()

# data string into a a file
with open("C:\Users\USERAAAA\Documents\Python_Scripts\testes.txt", "w") as f:
    f.write(output_str)
Answered By: Cass