Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3088

BigQuery source should consider streaming buffer when determining estimated sizes of tables

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.3.0
    • io-java-gcp
    • None

    Description

      Currently BigQuery table source determines estimated size using table.numBytes property.
      https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTableSource.java#L100

      If BigQuery table has data in the streaming buffer, size of that data will not be reflected by table.numBytes. To better estimate size of table, data in the streaming buffer has to be considered as well. Size of data in streaming buffer can be determined based on property streamingBuffer.estimatedBytes according to following.
      https://cloud.google.com/bigquery/docs/reference/rest/v2/tables

      Attachments

        Issue Links

          Activity

            People

              chamikara Chamikara Madhusanka Jayalath
              chamikara Chamikara Madhusanka Jayalath
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: