Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27623

Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.2
    • None
    • PySpark

    Description

      After updating to spark 2.4.2 when using the 

      spark.read.format().options().load()
      

       

      chain of methods, regardless of what parameter is passed to "format" we get the following error related to avro:

       

      - .options(**load_options)
      - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
      - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
      - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
      - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
      - py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
      - : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
      - at java.util.ServiceLoader.fail(ServiceLoader.java:232)
      - at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
      - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
      - at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
      - at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
      - at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
      - at scala.collection.Iterator.foreach(Iterator.scala:941)
      - at scala.collection.Iterator.foreach$(Iterator.scala:941)
      - at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
      - at scala.collection.IterableLike.foreach(IterableLike.scala:74)
      - at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
      - at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
      - at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
      - at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
      - at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
      - at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
      - at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
      - at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
      - at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
      - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
      - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
      - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      - at java.lang.reflect.Method.invoke(Method.java:498)
      - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
      - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
      - at py4j.Gateway.invoke(Gateway.java:282)
      - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
      - at py4j.commands.CallCommand.execute(CallCommand.java:79)
      - at py4j.GatewayConnection.run(GatewayConnection.java:238)
      - at java.lang.Thread.run(Thread.java:748)
      - Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/FileFormat$class
      - at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44)
      - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      - at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      - at java.lang.Class.newInstance(Class.java:442)
      - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
      - ... 29 more
      - Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.FileFormat$class
      - at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
      - at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
      - at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      - ... 36 more
      
      

       

      The code we run looks like this:

       

      spark_session = (
       SparkSession.builder
       .appName(APPLICATION_NAME)
       .master(MASTER_URL)
       .config('spark.cassandra.connection.host', SERVER_IP_ADDRESS)
       .config('spark.cassandra.auth.username', CASSANDRA_USERNAME)
       .config('spark.cassandra.auth.password', CASSANDRA_PASSWORD)
       .config('spark.sql.shuffle.partitions', 16)
       .config('parquet.enable.summary-metadata', 'true')
       .getOrCreate())
      
      
       load_options = {
       'keyspace': CASSANDRA_KEYSPACE,
       'table': TABLE_NAME,
       'spark.cassandra.input.fetch.size_in_rows': '150' }
      
      
       df = (spark_session.read.format('org.apache.spark.sql.cassandra')
       .options(**load_options)
       .load())
      

       

      We get the exact same error when trying to read a local .avro file instead of from Cassandra.

      Up to now we included the .jar file for Spark-Avro using the spark-submit --jars option. The version of Spark-Avro that we used, and worked with Spark 2.4.1, was Spark-Avro 2.4.0.

      In an attempt to fix this problem we tried updating the .jar file version. We also tried using the --packages option, with different version combinations, but none of these solutions worked. The same error shows up every time. 

      When rolling back to Spark 2.4.1 with the exact same setup and code, the error doesn't show up and everything works fine. 

      Any ideas on what could be causing this?

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            abarbulescu Alexandru Barbulescu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: