Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9010

BigQuery TableRow's size is toString().length() ?

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • Not applicable
    • runner-dataflow
    • None

    Description

      The following tests failed when I tried to upgrade google-http-client 1.34.0 from 1.28.0:

      org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
      org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
      org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
      

      https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink

      Reason of the test failures

      org.apache.beam.sdk.io.gcp.testing.TableContainer and org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl rely on TableRow.toString().length() to calculate the size. Example:

                dataSize += row.toString().length();
                if (dataSize >= maxRowBatchSize
                    || rows.size() >= maxRowsPerBatch
                    || i == rowsToPublish.size() - 1) {
      

      However, with google-http-client's PR#589, the GenericData.toString output has changed since v1.29.0.

      In old google-http-client 1.28.0, an example row's toString returned:

      {f=[{v=foo}, {v=1234}]}
      

      In new google-http-client 1.29.0 and higher, the same row's toString returns:

      GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, GenericData{classInfo=[v], {v=1234}}]}}
      

      Question:

      Is this right thing to rely on toString().length() in the BigQuery classes?

      Attachments

        Issue Links

          Activity

            People

              suztomo Tomo Suzuki
              suztomo Tomo Suzuki
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m