Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
2.10.0
-
None
-
None
Description
My understanding is that BigQueryIO runs the query, writes the output to a temp dataset, and then extracts the temp dataset to GCS. This means the location of the temp dataset (if not manually set) is determined by the tables referenced in the query. This is confirmed in the source code for BigQueryIO: https://github.com/apache/beam/blob/v2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L111
So I would expect that the temp dataset should also be created in the US location, or default to the US. Instead, it appears to be defaulting to "unknown" (at least some of the time), therefore causing the whole Dataflow job to fail.