Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6684

BigQueryIO: Unable to create dataset "Location unknown is not yet publicly available

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.10.0
    • None
    • io-java-gcp
    • None

    Description

      My understanding is that BigQueryIO runs the query, writes the output to a temp dataset, and then extracts the temp dataset to GCS. This means the location of the temp dataset (if not manually set) is determined by the tables referenced in the query. This is confirmed in the source code for BigQueryIO: https://github.com/apache/beam/blob/v2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L111

      So I would expect that the temp dataset should also be created in the US location, or default to the US. Instead, it appears to be defaulting to "unknown" (at least some of the time), therefore causing the whole Dataflow job to fail.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pabloem Pablo Estrada
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: