Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
Refer to https://github.com/apache/beam/pull/9772 for more information and the context of this ticket.
The following exception is being raised when ReadFromBigQuery PTransform is used on DirectRunner in Python SDK:
File "/home/Kamil/projects/beam/sdks/python/apache_beam/io/gcp/bigquery.py", line 639, in get_range_tracker raise NotImplementedError('BigQuery source must be split before being read') NotImplementedError: BigQuery source must be split before being read
The direct cause is get_range_tracker and read methods aren't implemented in _BigQuerySource. This is purposeful — the runner is expected to call split instead. The Java implementation works the same way: link
It seems that DataflowRunner and Flink are able to catch these exceptions somehow, while DirectRunner is not.
Attachments
Issue Links
- is caused by
-
BEAM-1440 Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
- Triage Needed