Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1596

PARQUET-1375 broke parquet-cli's to-avro command

    XMLWordPrintableJSON

Details

    Description

      Given the following JSON file:

      $ cat /tmp/sample.json 
      { "id": 1, "name": "Alice" }
      { "id": 2, "name": "Bob" }
      { "id": 3, "name": "Carol" }
      { "id": 4, "name": "Dave" }
      

      using to-avro on the master branch for converting this into avro fails with NPE:

      $ git branch -v
      * master 47398be7 PARQUET-1375: Upgrade to Jackson 2.9.9 (#616)
      $ mvn clean install -DskipTests
      
      (snip)
      
      [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ parquet-cli ---
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.jar
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/pom.xml to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.pom
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-tests.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-tests.jar
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-runtime.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-runtime.jar
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD SUCCESS
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time:  14.769 s
      [INFO] Finished at: 2019-06-12T23:52:57+09:00
      [INFO] ------------------------------------------------------------------------
      $ mvn dependency:copy-dependencies
      
      (snip)
      
      $ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main to-avro /tmp/sample.json -o /tmp/sample.avro
      Unknown error
      java.lang.RuntimeException: Failed on record 0
      	at org.apache.parquet.cli.commands.ToAvroCommand.run(ToAvroCommand.java:120)
      	at org.apache.parquet.cli.Main.run(Main.java:147)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.parquet.cli.Main.main(Main.java:177)
      Caused by: java.lang.NullPointerException
      	at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:153)
      	at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:145)
      	at org.apache.parquet.cli.commands.ToAvroCommand.run(ToAvroCommand.java:112)
      	... 3 more
      $ echo $?
      1
      

      But with its previous revision, it succeeds:

      $ git checkout HEAD^
      HEAD is now at 9d6fb45e PARQUET-1576 Bump Apache Avro to 1.9.0 (#638)
      $ mvn clean install -DskipTests
      
      (snip)
      
      [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ parquet-cli ---
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.jar
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/pom.xml to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.pom
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-tests.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-tests.jar
      [INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-runtime.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-runtime.jar
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD SUCCESS
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time:  15.822 s
      [INFO] Finished at: 2019-06-12T23:57:04+09:00
      [INFO] ------------------------------------------------------------------------
      $ mvn dependency:copy-dependencies
      
      (snip)
      
      $ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main to-avro /tmp/sample.json -o /tmp/sample.avro
      $ echo $?
      0
      $ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main head /tmp/sample.avro
      {"id": 1, "name": "Alice"}
      {"id": 2, "name": "Bob"}
      {"id": 3, "name": "Carol"}
      {"id": 4, "name": "Dave"}
      

      Reverting the following code

      AvroJson.java
         public static Iterator<JsonNode> parser(final InputStream stream) {
           try(JsonParser parser = FACTORY.createParser(stream)) {
      

      to

         public static Iterator<JsonNode> parser(final InputStream stream) {
           try {
            JsonParser parser = FACTORY.createParser(stream);
      

      seems to work.

      cc Fokko

      Attachments

        Issue Links

          Activity

            People

              fokko Fokko Driesprong
              sekikn Kengo Seki
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: