Description
insertAll will retry forever on a streaming pipeline running on 2.31.0, with insert_retry_strategy=RetryStrategy.RETRY_NEVER, and create_disposition=BigQueryDisposition.CREATE_NEVER.
Found while testing error handling for a pipeline by writing to a table that doesn't exist, ending up with no element in BigQueryWriteFn.FAILED_ROWS and these errors repeated in the logs:
Error message from worker: generic::unknown: Traceback (most recent call last): File "apache_beam/runners/common.py", line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method File "apache_beam/runners/common.py", line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle File "apache_beam/runners/common.py", line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1268, in finish_bundle return self._flush_all_batches() File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1278, in _flush_all_batches for destination in list(self._rows_buffer.keys()) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1279, in <listcomp> if self._rows_buffer[destination] File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1312, in _flush_batch skip_invalid_rows=True) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 1125, in insert_rows project_id, dataset_id, table_id, final_rows, skip_invalid_rows) File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 637, in _insert_all_rows response = self.client.tabledata.InsertAll(request) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 795, in InsertAll config, request, global_params=global_params) File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 731, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse http_response, method_config=method_config, request=request) apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length': '344', '-content-encoding': 'gzip'}>, content <{ "error": { "code": 404, "message": "Not found: Table <REDACTED>:testdb__dbo__raw.customers", "errors": [ { "message": "Not found: Table <REDACTED>:testdb__dbo__raw.customers", "domain": "global", "reason": "notFound" } ], "status": "NOT_FOUND" } } ...
Possibly related to BEAM-12362. Had been running on 2.29.0 previously, which would send errors repeatedly with no trace:
There were errors inserting to BigQuery. Will not retry. Errors were []
2.31.0 is logging the errors but ignores retry strategy, preventing errors from being handled through FailedRows tag.