Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1462

DirectRunner unnecessarily re-scheules tasks after exceptions

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • sdk-py-core

    Description

      Seems like DirectRunner keeps scheduling tasks when exceptions occur when reading BigQuery results (and possibly in other cases).

      I verified that rescheduling is not coming from BigQuery. AFAIKT a _MonitorTask that gets added at following location does not get removed properly when an exception is thrown.
      https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/direct/executor.py#L361

      To reproduce:
      (1) Raise a 'ValueError' at the beginning of method BigQueryWrapper.convert_row_to_dict at following location.
      https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py#L1061
      (2) Setup Python SDK and run bigquery_tornadoes with DirectRunner.
      python -m apache_beam.examples.cookbook.bigquery_tornadoes --output <table> --project <project>

      Attachments

        Activity

          People

            Unassigned Unassigned
            chamikara Chamikara Madhusanka Jayalath
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: