How to read bytes type from bigquery in Java?

Question:

We have a legacy dataflow job in Scala which basically reads from Bigquery and then dumps it into Postgres.
In Scala we read from bigquery, map it onto a case class and then dump it into Postgres, and it works perfectly for bigquery’s Bytes type as well.
The Schema we read from BQ into has an Array[Byte] field in Scala, we use .setBytes function to dump it into postgres table in the relevant Bytea column.

Now we are migrating that job to Java, we are not using type case classes this time and the read from bigquery returns as com.google.api.services.bigquery.model.TableRow object, for all the other field types it works as expected but I am having issues with the Bytes type.

When I do

insetQuery.setBytes(3, row.get('bytes_type_column'))

it says that setBytes column expects bytes, while row.get('bytes_type_column') is an object. Now, if I do row.get('bytes_type_column').toString().getBytes(), it works fine but it seems like the content of the original bytes columns is changed and I can not use it after reading from Postgres.
It seems to me that .toString() messes up the bytes and changes into some Java string converting which to bytes messes up the original form.

The other approach I tried was

insetQuery.setBytes(3, (byte[])row.get('bytes_type_column'))

which also seems to have changed the content of the column.
Had the same issue when I tried this answer.

I have almost no experience with Java, can someone guide me here on how can I dump the BQ’s byte column value I read as it is into Postgres without changing anything in it?
Thanks.

If it’s helpful for anyone, the BQ’s byte column is actually a pickled python object, which I want to dump in Postgres, and then unpickle after reading in a Python application, if it’s not being unpickled it means it wasn’t dumped as it is.

Asked By: saadi

||

Answers:

After snooping around the internet looking for the solution and going deep into the official repositories & examples, I finally found the solution for this here.

Basically, you will have to first do this:

byte[] bytes_type_column = Base64.getDecoder().decode((String) row.get("bytes_type_column"));

and then add it to your query

insetQuery.setBytes(3, bytes_type_column)
Answered By: saadi