Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5352

BigQuery IO Source is not Exporting to GCS as written in documentation

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.6.0
    • None
    • sdk-py-core

    Description

      Did some check on the beam code and find out that DataFlow is querying BigQuery and retrieve the result using pagination [1]. As per our understanding, this means no parallelism on reading BigQuery table. It is contradictory to what the documentation is telling us [2].
       
      Is this some kind of work in progress? I'm filing as a bug since documentation telling me that it is using GCS meanwhile it's using NativeSourceReader which yield data per row as iterator.
       
      [1] https://github.com/apache/beam/blob/520b3a24e49306c30940ceab09100d775a04d28e/sdks/python/apache_beam/io/gcp/bigquery.py#L1083
      [2] https://github.com/apache/beam/blob/520b3a24e49306c30940ceab09100d775a04d28e/sdks/python/apache_beam/io/gcp/bigquery.py#L60

      Attachments

        Activity

          People

            Unassigned Unassigned
            rendybjunior Rendy Bambang Junior
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: