Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50414

Incorrect statement that causes java.lang.NoClassDefFoundError in Spark Connect documents

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.1, 3.5.2, 3.5.3
    • None
    • Connect, Documentation
    • None

    Description

      Issue Description

      I created a small Scala application to verify Spark Connect functionality by running a simple query. I tried to follow the Use Spark Connect in standalone applications section as:

       

      // build.sbt
      lazy val root = (project in file("."))
        .settings(
          scalaVersion := "2.13.12",
          name := "Sample app",
          libraryDependencies ++=
            "org.apache.spark" %% "spark-sql-api" % "3.5.3" ::
              "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.3" ::
              Nil
        ) 

       

      // src/main/scala/example/Hello.scala
      package example
      
      import org.apache.spark.sql.SparkSession
      
      object Hello extends App {
        private val spark = SparkSession.builder().remote("sc://localhost").build()
        spark.sql("select 1").show()
        spark.close()
      }

      However, when I run "sbt run", I got the following exception:

      Exception in thread "sbt-bg-threads-1" java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator
              at java.base/java.lang.ClassLoader.defineClass1(Native Method)
              at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022)
              at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
              at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
              at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
              at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)
              at java.base/java.security.AccessController.doPrivileged(Native Method)
              at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451)
              at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:103)
              at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
              at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
              at io.netty.buffer.PooledByteBufAllocatorL.<init>(PooledByteBufAllocatorL.java:49)
              at org.apache.arrow.memory.NettyAllocationManager.<clinit>(NettyAllocationManager.java:51)
              at org.apache.arrow.memory.DefaultAllocationManagerFactory.<clinit>(DefaultAllocationManagerFactory.java:26)
              at java.base/java.lang.Class.forName0(Native Method)
              at java.base/java.lang.Class.forName(Class.java:315)
              at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)
              at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)
              at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)
              at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)
              at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)
              at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:47)
              at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:24)
              at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)
              at org.apache.arrow.memory.BaseAllocator.<clinit>(BaseAllocator.java:61)
              at org.apache.spark.sql.util.ArrowUtils$.<clinit>(ArrowUtils.scala:34)
              at org.apache.spark.sql.connect.client.arrow.ArrowVectorReader$.apply(ArrowVectorReader.scala:70) 
      ...

      Issue Cause

      I have investigated this issue, and I suppose the following is the cause.

      The "spark-connect-client-jvm" is a fat jar containing dependency classes, relocating some under the "org.sparkproject" package. Starting from version 3.5.1 (see SPARK-45371), it also includes classes from "spark-sql-api". As a result, references to relocated classes within "spark-sql-api" are updated accordingly. However, specifying "spark-sql-api" as an application dependency causes the classloader to load the original classes, leading to conflicts with relocated references.

      In fact, in the stack trace, it tried to load io/netty/buffer/PooledByteBufAllocator, which must be org/sparkproject/io/netty/buffer/PooledByteBufAllocator.

      How to resolve

      To resolve this issue, removing the dependency on "spark-sql-api" in the application resolves the classloader conflict. I have verified this solution with versions 3.5.1, 3.5.2, and 3.5.3, and it works consistently.

      We should fix the documentation by removing the "spark-sql-api" dependency so that users do not encounter the same issue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            choplin Akihiro Okuno
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: