Details
-
Bug
-
Status: Open
-
P1
-
Resolution: Unresolved
-
2.37.0
-
None
Description
We have a Python Streaming Dataflow job that writes to BigQuery using the FILE_LOADS method and auto_sharding enabled. When we try to drain the job it fails with the following error,
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 1000, in perform_load_job ValueError: Either a non-empty list of fully-qualified source URIs must be provided via the source_uris parameter or an open file object must be provided via the source_stream parameter.
Our WriteToBigQuery configuration,
beam.io.WriteToBigQuery( table=options.output_table, schema=bq_schema, create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND, insert_retry_strategy=RetryStrategy.RETRY_ON_TRANSIENT_ERROR, method=beam.io.WriteToBigQuery.Method.FILE_LOADS, additional_bq_parameters={ "timePartitioning": { "type": "HOUR", "field": "bq_insert_timestamp", }, "schemaUpdateOptions": ["ALLOW_FIELD_ADDITION", "ALLOW_FIELD_RELAXATION"], }, triggering_frequency=120, with_auto_sharding=True, )
We are also noticing that the job only fails to drain when there are actual schema updates. If there are no schema updates the job drains without the above error.