Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12479

UnsupportedOperationException when reading from BigQuery tables and converting TableRows to Beam Rows

Details

    Description

      UnsupportedOperationExceptions are thrown in

      org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils#toBeamValue(FieldType, Object)

      when reading from BigQuery tables with 

      org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO#readTableRowsWithSchema() and converting the returned TableRows to Beam Rows

      Example:

      PCollection<Row> rows =
       pipeline
       .apply(
       "Read from BigQuery table",
       BigQueryIO.readTableRowsWithSchema().from(String.format("%s:%s.%s", project, dataset, table)))
       .apply(Convert.toRows());

       

      UnsupportedOperationException messages that I have encountered are of the type:

      Converting BigQuery type "java.lang.Boolean" to "BOOLEAN" is not supported

      Converting BigQuery type "java.lang.Double" to "DOUBLE" is not supported

      ...While the conversion of these Java types should be straightforward.

      Indeed, the method BigQueryUtils#toBeamValue(FieldType, Object) expects only String objects or Collections of Strings.

      I had to upgrade com.google.cloud:google-cloud-bigquery from 1.108.0 to 1.132.0 in my project. So my guess is this latest version is now able to map BiqQuery (SQL) types to Java types instead of raw Strings, in particular BOOL to Boolean, INT64 to Long and FLOAT64 to Double.

      In my understanding though, from BigQuery to Beam, there would be no need to manage Java Byte, Short, Integer and Float since BigQuery types are "limited" to standard SQL types INT64, and FLOAT64 that encompass them all (on the other hand, BQ NUMERIC type is mapped to Java BigDecimal).

      I propose a pull request to also support Number and Boolean objects in method BigQueryUtils#toBeamValue(FieldType, Object). It is only added behavior and the updated method is still compatible with the current functioning.

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pgillet Pascal GILLET
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m