Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50414

Incorrect statement that causes java.lang.NoClassDefFoundError in Spark Connect documents

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.1, 3.5.2, 3.5.3
    • None
    • Connect, Documentation
    • None

    Description

      Issue Description

      I created a small Scala application to verify Spark Connect functionality by running a simple query. I tried to follow the Use Spark Connect in standalone applications section as:

       

      // build.sbt
      lazy val root = (project in file("."))
        .settings(
          scalaVersion := "2.13.12",
          name := "Sample app",
          libraryDependencies ++=
            "org.apache.spark" %% "spark-sql-api" % "3.5.3" ::
              "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.3" ::
              Nil
        ) 

       

      // src/main/scala/example/Hello.scala
      package example
      
      import org.apache.spark.sql.SparkSession
      
      object Hello extends App {
        private val spark = SparkSession.builder().remote("sc://localhost").build()
        spark.sql("select 1").show()
        spark.close()
      }

      However, when I run "sbt run", I got the following exception:

      Exception in thread "sbt-bg-threads-1" java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator
              at java.base/java.lang.ClassLoader.defineClass1(Native Method)
              at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022)
              at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
              at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
              at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
              at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)
              at java.base/java.security.AccessController.doPrivileged(Native Method)
              at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451)
              at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:103)
              at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
              at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
              at io.netty.buffer.PooledByteBufAllocatorL.<init>(PooledByteBufAllocatorL.java:49)
              at org.apache.arrow.memory.NettyAllocationManager.<clinit>(NettyAllocationManager.java:51)
              at org.apache.arrow.memory.DefaultAllocationManagerFactory.<clinit>(DefaultAllocationManagerFactory.java:26)
              at java.base/java.lang.Class.forName0(Native Method)
              at java.base/java.lang.Class.forName(Class.java:315)
              at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)
              at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)
              at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)
              at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)
              at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)
              at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:47)
              at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:24)
              at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)
              at org.apache.arrow.memory.BaseAllocator.<clinit>(BaseAllocator.java:61)
              at org.apache.spark.sql.util.ArrowUtils$.<clinit>(ArrowUtils.scala:34)
              at org.apache.spark.sql.connect.client.arrow.ArrowVectorReader$.apply(ArrowVectorReader.scala:70) 
      ...

      Issue Cause

      I have investigated this issue, and I suppose the following is the cause.

      The "spark-connect-client-jvm" is a fat jar containing dependency classes, relocating some under the "org.sparkproject" package. Starting from version 3.5.1 (see SPARK-45371), it also includes classes from "spark-sql-api". As a result, references to relocated classes within "spark-sql-api" are updated accordingly. However, specifying "spark-sql-api" as an application dependency causes the classloader to load the original classes, leading to conflicts with relocated references.

      In fact, in the stack trace, it tried to load io/netty/buffer/PooledByteBufAllocator, which must be org/sparkproject/io/netty/buffer/PooledByteBufAllocator.

      How to resolve

      To resolve this issue, removing the dependency on "spark-sql-api" in the application resolves the classloader conflict. I have verified this solution with versions 3.5.1, 3.5.2, and 3.5.3, and it works consistently.

      We should fix the documentation by removing the "spark-sql-api" dependency so that users do not encounter the same issue.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            choplin Akihiro Okuno

            Dates

              Created:
              Updated:

              Slack

                Issue deployment