Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12783

WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.31.0
    • None
    • io-py-gcp

    Description

      insertAll will retry forever on a streaming pipeline running on 2.31.0, with insert_retry_strategy=RetryStrategy.RETRY_NEVER, and create_disposition=BigQueryDisposition.CREATE_NEVER.

      Found while testing error handling for a pipeline by writing to a table that doesn't exist, ending up with no element in BigQueryWriteFn.FAILED_ROWS and these errors repeated in the logs:

      Error message from worker: generic::unknown: Traceback (most recent call last):
        File "apache_beam/runners/common.py", line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
        File "apache_beam/runners/common.py", line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
        File "apache_beam/runners/common.py", line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1268, in finish_bundle
          return self._flush_all_batches()
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1278, in _flush_all_batches
          for destination in list(self._rows_buffer.keys())
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1279, in <listcomp>
          if self._rows_buffer[destination]
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1312, in _flush_batch
          skip_invalid_rows=True)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 1125, in insert_rows
          project_id, dataset_id, table_id, final_rows, skip_invalid_rows)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
          return fun(*args, **kwargs)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 637, in _insert_all_rows
          response = self.client.tabledata.InsertAll(request)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 795, in InsertAll
          config, request, global_params=global_params)
        File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 731, in _RunMethod
          return self.ProcessHttpResponse(method_config, http_response, request)
        File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
          self.__ProcessHttpResponse(method_config, http_response, request))
        File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse
          http_response, method_config=method_config, request=request)
      apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length': '344', '-content-encoding': 'gzip'}>, content <{
        "error": {
          "code": 404,
          "message": "Not found: Table <REDACTED>:testdb__dbo__raw.customers",
          "errors": [
            {
              "message": "Not found: Table <REDACTED>:testdb__dbo__raw.customers",
              "domain": "global",
              "reason": "notFound"
            }
          ],
          "status": "NOT_FOUND"
        }
      }
      ...
      

      Possibly related to BEAM-12362. Had been running on 2.29.0 previously, which would send errors repeatedly with no trace:

      There were errors inserting to BigQuery. Will not retry. Errors were []
      

      2.31.0 is logging the errors but ignores retry strategy, preventing errors from being handled through FailedRows tag.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ajdub980a Adam Whitmore
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: