Details
-
Bug
-
Status: Triage Needed
-
P2
-
Resolution: Fixed
-
2.29.0
-
Beam Java SDK
Description
UnsupportedOperationExceptions are thrown in
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#toBeamValue(FieldType, Object)
when reading from BigQuery tables with
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO#readTableRowsWithSchema() and converting the returned TableRows to Beam Rows
Example:
PCollection<Row> rows = pipeline .apply( "Read from BigQuery table", BigQueryIO.readTableRowsWithSchema().from(String.format("%s:%s.%s", project, dataset, table))) .apply(Convert.toRows());
UnsupportedOperationException messages that I have encountered are of the type:
Converting BigQuery type "java.lang.Boolean" to "BOOLEAN" is not supported
Converting BigQuery type "java.lang.Double" to "DOUBLE" is not supported
...While the conversion of these Java types should be straightforward.
Indeed, the method BigQueryUtils#toBeamValue(FieldType, Object) expects only String objects or Collections of Strings.
I had to upgrade com.google.cloud:google-cloud-bigquery from 1.108.0 to 1.132.0 in my project. So my guess is this latest version is now able to map BiqQuery (SQL) types to Java types instead of raw Strings, in particular BOOL to Boolean, INT64 to Long and FLOAT64 to Double.
In my understanding though, from BigQuery to Beam, there would be no need to manage Java Byte, Short, Integer and Float since BigQuery types are "limited" to standard SQL types INT64, and FLOAT64 that encompass them all (on the other hand, BQ NUMERIC type is mapped to Java BigDecimal).
I propose a pull request to also support Number and Boolean objects in method BigQueryUtils#toBeamValue(FieldType, Object). It is only added behavior and the updated method is still compatible with the current functioning.
Attachments
Issue Links
- links to