Details
-
Bug
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Steps to reproduce:
PROJECT=$(gcloud config get-value project) BUCKET=${USER}_gcs_bucket BQ_DATASET=${USER}_bq_dataset TABLE_NAME=out bq mk --project=$PROJECT $BQ_DATASET gsutil mb gs://$BUCKET PATH_TO_REPO_CLONE=/path/to/beam mvn archetype:generate -DarchetypeGroupId=org.apache.beam -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples -DarchetypeVersion=2.8.0 -DgroupId=org.example -DartifactId=word-count-beam -Dversion="0.1" -Dpackage=org.apache.beam.examples -DinteractiveMode=false cd word-count-beam/ mkdir src/main/java/org/apache/beam/examples/cookbook cp $PATH_TO_REPO_CLONE/examples/java/src/main/java/org/apache/beam/examples/cookbook//BigQueryTornadoes.java ./src/main/java/org/apache/beam/examples/cookbook mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.cookbook.BigQueryTornadoes -Dexec.args="--runner=DataflowRunner --project=$PROJECT --input=clouddataflow-readonly:samples.weather_stations --gcpTempLocation=gs://$BUCKET/tmp --output=$BQ_DATASET.$TABLE_NAME " -Pdataflow-runner
This fails with:
java.lang.IllegalArgumentException: BigQueryIO.Read needs a GCS temp location to store temp files. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:662) at org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:641) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:645) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649) at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311) at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245) at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458) at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:577) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:312) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299) at org.apache.beam.examples.cookbook.BigQueryTornadoes.runBigQueryTornadoes(BigQueryTornadoes.java:166) at org.apache.beam.examples.cookbook.BigQueryTornadoes.main(BigQueryTornadoes.java:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282) at java.lang.Thread.run(Thread.java:748)
Ironically, the example works if we remove --gcpTempLocation. From logs, we can see that in that case we use a bucket that looks like: gs://dataflow-staging-us-central1-927334603519.