Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8528

BigQuery bounded source does not work on DirectRunner

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • Not applicable
    • sdk-py-core
    • None

    Description

      Refer to https://github.com/apache/beam/pull/9772 for more information and the context of this ticket.

      The following exception is being raised when ReadFromBigQuery PTransform is used on DirectRunner in Python SDK:

        File "/home/Kamil/projects/beam/sdks/python/apache_beam/io/gcp/bigquery.py", line 639, in get_range_tracker
          raise NotImplementedError('BigQuery source must be split before being read')
      NotImplementedError: BigQuery source must be split before being read
      

       The direct cause is get_range_tracker and read methods aren't implemented in _BigQuerySource. This is purposeful — the runner is expected to call split instead. The Java implementation works the same way: link

      It seems that DataflowRunner and Flink are able to catch these exceptions somehow, while DirectRunner is not.

      Attachments

        Issue Links

          Activity

            People

              kamilwu Kamil Wasilewski
              kamilwu Kamil Wasilewski
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: