Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6069

Bigquery Tornadoes example fails to run when we pass a custom temp location.

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • examples-java, io-java-gcp
    • None

    Description

      Steps to reproduce:

      PROJECT=$(gcloud config get-value project)
      BUCKET=${USER}_gcs_bucket
      BQ_DATASET=${USER}_bq_dataset
      TABLE_NAME=out
      
      bq mk --project=$PROJECT $BQ_DATASET
      gsutil mb gs://$BUCKET
      
      
      PATH_TO_REPO_CLONE=/path/to/beam
      
      mvn archetype:generate -DarchetypeGroupId=org.apache.beam -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples  -DarchetypeVersion=2.8.0  -DgroupId=org.example  -DartifactId=word-count-beam  -Dversion="0.1" -Dpackage=org.apache.beam.examples -DinteractiveMode=false
      
      cd word-count-beam/
      
      mkdir src/main/java/org/apache/beam/examples/cookbook
      
      cp $PATH_TO_REPO_CLONE/examples/java/src/main/java/org/apache/beam/examples/cookbook//BigQueryTornadoes.java ./src/main/java/org/apache/beam/examples/cookbook
      
      mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.cookbook.BigQueryTornadoes -Dexec.args="--runner=DataflowRunner --project=$PROJECT --input=clouddataflow-readonly:samples.weather_stations --gcpTempLocation=gs://$BUCKET/tmp --output=$BQ_DATASET.$TABLE_NAME " -Pdataflow-runner
      
      

      This fails with:

      java.lang.IllegalArgumentException: BigQueryIO.Read needs a GCS temp location to store temp files.
      at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
      at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:662)
      at org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:641)
      at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:645)
      at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
      at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
      at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
      at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
      at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:577)
      at org.apache.beam.sdk.Pipeline.run(Pipeline.java:312)
      at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
      at org.apache.beam.examples.cookbook.BigQueryTornadoes.runBigQueryTornadoes(BigQueryTornadoes.java:166)
      at org.apache.beam.examples.cookbook.BigQueryTornadoes.main(BigQueryTornadoes.java:172)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
      at java.lang.Thread.run(Thread.java:748)
      
      

      Ironically, the example works if we remove --gcpTempLocation. From logs, we can see that in that case we use a bucket that looks like: gs://dataflow-staging-us-central1-927334603519.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tvalentyn Valentyn Tymofieiev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: