Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12479

UnsupportedOperationException when reading from BigQuery tables and converting TableRows to Beam Rows

    XMLWordPrintableJSON

    Details

      Description

      UnsupportedOperationExceptions are thrown in

      org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#toBeamValue(FieldType, Object)

      when reading from BigQuery tables with 

      org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO#readTableRowsWithSchema() and converting the returned TableRows to Beam Rows

      Example:

      PCollection<Row> rows =
       pipeline
       .apply(
       "Read from BigQuery table",
       BigQueryIO.readTableRowsWithSchema().from(String.format("%s:%s.%s", project, dataset, table)))
       .apply(Convert.toRows());

       

      UnsupportedOperationException messages that I have encountered are of the type:

      Converting BigQuery type "java.lang.Boolean" to "BOOLEAN" is not supported

      Converting BigQuery type "java.lang.Double" to "DOUBLE" is not supported

      ...While the conversion of these Java types should be straightforward.

      Indeed, the method BigQueryUtils#toBeamValue(FieldType, Object) expects only String objects or Collections of Strings.

      I had to upgrade com.google.cloud:google-cloud-bigquery from 1.108.0 to 1.132.0 in my project. So my guess is this latest version is now able to map BiqQuery (SQL) types to Java types instead of raw Strings, in particular BOOL to Boolean, INT64 to Long and FLOAT64 to Double.

      In my understanding though, from BigQuery to Beam, there would be no need to manage Java Byte, Short, Integer and Float since BigQuery types are "limited" to standard SQL types INT64, and FLOAT64 that encompass them all (on the other hand, BQ NUMERIC type is mapped to Java BigDecimal).

      I propose a pull request to also support Number and Boolean objects in method BigQueryUtils#toBeamValue(FieldType, Object). It is only added behavior and the updated method is still compatible with the current functioning.

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pgillet Pascal GILLET
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m