Details
-
Bug
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
Description
Seems like DirectRunner keeps scheduling tasks when exceptions occur when reading BigQuery results (and possibly in other cases).
I verified that rescheduling is not coming from BigQuery. AFAIKT a _MonitorTask that gets added at following location does not get removed properly when an exception is thrown.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/direct/executor.py#L361
To reproduce:
(1) Raise a 'ValueError' at the beginning of method BigQueryWrapper.convert_row_to_dict at following location.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py#L1061
(2) Setup Python SDK and run bigquery_tornadoes with DirectRunner.
python -m apache_beam.examples.cookbook.bigquery_tornadoes --output <table> --project <project>