Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9010

BigQuery TableRow's size is toString().length() ?

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Not applicable
    • Component/s: runner-dataflow
    • Labels:
      None

      Description

      The following tests failed when I tried to upgrade google-http-client 1.34.0 from 1.28.0:

      org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
      org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
      org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
      

      https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink

      Reason of the test failures

      org.apache.beam.sdk.io.gcp.testing.TableContainer and org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl rely on TableRow.toString().length() to calculate the size. Example:

                dataSize += row.toString().length();
                if (dataSize >= maxRowBatchSize
                    || rows.size() >= maxRowsPerBatch
                    || i == rowsToPublish.size() - 1) {
      

      However, with google-http-client's PR#589, the GenericData.toString output has changed since v1.29.0.

      In old google-http-client 1.28.0, an example row's toString returned:

      {f=[{v=foo}, {v=1234}]}
      

      In new google-http-client 1.29.0 and higher, the same row's toString returns:

      GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, GenericData{classInfo=[v], {v=1234}}]}}
      

      Question:

      Is this right thing to rely on toString().length() in the BigQuery classes?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                suztomo Tomo Suzuki
                Reporter:
                suztomo Tomo Suzuki
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m