[BEAM-8528] BigQuery bounded source does not work on DirectRunner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Not applicable
Component/s: sdk-py-core
Labels:
None

Description

Refer to https://github.com/apache/beam/pull/9772 for more information and the context of this ticket.

The following exception is being raised when ReadFromBigQuery PTransform is used on DirectRunner in Python SDK:

  File "/home/Kamil/projects/beam/sdks/python/apache_beam/io/gcp/bigquery.py", line 639, in get_range_tracker
    raise NotImplementedError('BigQuery source must be split before being read')
NotImplementedError: BigQuery source must be split before being read

The direct cause is get_range_tracker and read methods aren't implemented in _BigQuerySource. This is purposeful — the runner is expected to call split instead. The Java implementation works the same way: link

It seems that DataflowRunner and Flink are able to catch these exceptions somehow, while DirectRunner is not.

Attachments

Issue Links

is caused by

BEAM-1440 Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

Triage Needed

Activity

People

Assignee:: Kamil Wasilewski

Reporter:: Kamil Wasilewski

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 31/Oct/19 10:35

Updated:: 16/May/20 14:15

Resolved:: 02/Jan/20 09:00