Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19697

NoSuchMethodError: org.apache.avro.Schema.getLogicalType()

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Component/s: Build, Spark Core
    • Labels:
      None
    • Environment:

      Apache Spark 2.1.0, Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_60

      Description

      In a downstream project (https://github.com/bigdatagenomics/adam), adding a dependency on parquet-avro version 1.8.2 results in NoSuchMethodExceptions at runtime on various Spark versions, including 2.1.0.

      pom.xml:

        <properties>
          <java.version>1.8</java.version>
          <avro.version>1.8.1</avro.version>
          <scala.version>2.11.8</scala.version>
          <scala.version.prefix>2.11</scala.version.prefix>
          <spark.version>2.1.0</spark.version>
          <parquet.version>1.8.2</parquet.version>
      <!-- ... -->
        <dependencyManagement>
          <dependencies>
            <dependency>
              <groupId>org.apache.parquet</groupId>
              <artifactId>parquet-avro</artifactId>
              <version>${parquet.version}</version>
            </dependency>
      

      Example using spark-submit (called via adam-submit below):

      $ ./bin/adam-submit vcf2adam \
        adam-core/src/test/resources/small.vcf \
        small.adam
      ...
      java.lang.NoSuchMethodError: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
        at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
        at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
        at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
        at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
        at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
        at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
        at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)
        at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:283)
        at org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
      

      The issue can be reproduced from this pull request
      https://github.com/bigdatagenomics/adam/pull/1360

      and is reported as Jenkins CI test failures, e.g.
      https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1810

      dev@spark.apache.org mailing list archive thread
      http://apache-spark-developers-list.1001551.n3.nabble.com/Re-VOTE-Release-Apache-Parquet-1-8-2-RC1-tp20711p20720.html

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                heuermh Michael Heuer
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: