[BEAM-2761] Write to empty BigQuery partition fails with "No schema specified on job or table." despite having provided schema - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: P3
Resolution: Cannot Reproduce
Affects Version/s: 2.1.0, 2.2.0
Fix Version/s: 2.2.0
Component/s: runner-dataflow
Labels:
None

Description

In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a BigQuery partition fail with "java.lang.RuntimeException: Failed to create load job with id prefix". This is associated with a message "No schema specified on job or table" even though a schema is provided. See attached stack trace for the more detail on the error.

Command to run job:

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
     -Dexec.args="--runner=DataflowRunner --project=<GCP project> \
                  --gcpTempLocation=<tmp location>" \
     -Pdataflow-runner

Code to reproduce the problem:

EmptyPCollection.java

public class EmptyPCollection {

  public static void main(String[] args) {

    PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
    options.setTempLocation("<your tmp location>");
    Pipeline pipeline = Pipeline.create(options);

    String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", \"mode\": \"required\"}]}";
    String table = "mydataset.pets";
    List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
    PCollection<String> inputText = pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
    PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, TableRow>() {
      @ProcessElement
      public void processElement(ProcessContext c) {
        String text = c.element();
        if (text.startsWith("X")) {  // change to (D)og and works fine
          TableRow row = new TableRow();
          row.set("pet", text);
          c.output(row);
        }
      }
    }));

    rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
        .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

    pipeline.run().waitUntilFinish();

  }
}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

beam-2761-stacktrace.txt
10/Aug/17 17:39
11 kB
Fallon

Activity

People

Assignee:: Reuven Lax

Reporter:: Fallon

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Aug/17 17:37

Updated:: 16/May/20 13:22

Resolved:: 12/Sep/17 19:16