GCP Dataflow – NoneType error during WriteToBigQuery()
Question:
I’m trying to transfer data in a csv file from GCS to BQ using beam but I get a NoneType error when I call WriteToBigQuery. The error message:
AttributeError: 'NoneType' object has no attribute 'items' [while running 'Write to BQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']
My pipeline code:
import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.io.textio import ReadFromText
options = {
'project': project,
'region': region,
'temp_location': bucket
'staging_location': bucket
'setup_file': './setup.py'
}
class Split(beam.DoFn):
def process(self, element):
n, cc = element.split(",")
return [{
'n': int(n.strip('"')),
'connection_country': str(cc.strip()),
}]
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(options=pipeline_options) as pipeline:
(pipeline
| 'Read from GCS' >> ReadFromText('file_path*', skip_header_lines=1)
| 'parse input' >> beam.ParDo(Split())
| 'print' >> beam.Map(print)
| 'Write to BQ' >> beam.io.WriteToBigQuery(
'from_gcs', 'demo', schema='n:INTEGER, connection_country:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
)
My csv looks like this:
And the beam excerpt at the print() stage looks like this:
Appreciate any help!
Answers:
You are getting that error because the print
function does not return anything, so no elements go to the WriteToBQ
step. You can fix it with:
def print_fn(element):
print(element)
return element
{..}
| 'print' >> beam.Map(print_fn) # Note that now I'm referencing to the fn
| 'Write to BQ' >> beam.io.WriteToBigQuery(
{..}
Also, if you are going to run this in Dataflow, the print
is not going to appear, but you can use logging.info()
You can filter out None type messages with
def filter_none_messages(msg):
print(F"Message filtered: {msg}")
return msg
and add | "FilterNoneMessages" >> beam.Filter(filter_none_messages)
in your pipeline.
I’m trying to transfer data in a csv file from GCS to BQ using beam but I get a NoneType error when I call WriteToBigQuery. The error message:
AttributeError: 'NoneType' object has no attribute 'items' [while running 'Write to BQ/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']
My pipeline code:
import apache_beam as beam
from apache_beam.pipeline import PipelineOptions
from apache_beam.io.textio import ReadFromText
options = {
'project': project,
'region': region,
'temp_location': bucket
'staging_location': bucket
'setup_file': './setup.py'
}
class Split(beam.DoFn):
def process(self, element):
n, cc = element.split(",")
return [{
'n': int(n.strip('"')),
'connection_country': str(cc.strip()),
}]
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(options=pipeline_options) as pipeline:
(pipeline
| 'Read from GCS' >> ReadFromText('file_path*', skip_header_lines=1)
| 'parse input' >> beam.ParDo(Split())
| 'print' >> beam.Map(print)
| 'Write to BQ' >> beam.io.WriteToBigQuery(
'from_gcs', 'demo', schema='n:INTEGER, connection_country:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)
)
My csv looks like this:
And the beam excerpt at the print() stage looks like this:
Appreciate any help!
You are getting that error because the print
function does not return anything, so no elements go to the WriteToBQ
step. You can fix it with:
def print_fn(element):
print(element)
return element
{..}
| 'print' >> beam.Map(print_fn) # Note that now I'm referencing to the fn
| 'Write to BQ' >> beam.io.WriteToBigQuery(
{..}
Also, if you are going to run this in Dataflow, the print
is not going to appear, but you can use logging.info()
You can filter out None type messages with
def filter_none_messages(msg):
print(F"Message filtered: {msg}")
return msg
and add | "FilterNoneMessages" >> beam.Filter(filter_none_messages)
in your pipeline.