Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.1, 3.5.2, 3.5.3
-
None
-
None
Description
Issue Description
I created a small Scala application to verify Spark Connect functionality by running a simple query. I tried to follow the Use Spark Connect in standalone applications section as:
// build.sbt lazy val root = (project in file(".")) .settings( scalaVersion := "2.13.12", name := "Sample app", libraryDependencies ++= "org.apache.spark" %% "spark-sql-api" % "3.5.3" :: "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.3" :: Nil )
// src/main/scala/example/Hello.scala package example import org.apache.spark.sql.SparkSession object Hello extends App { private val spark = SparkSession.builder().remote("sc://localhost").build() spark.sql("select 1").show() spark.close() }
However, when I run "sbt run", I got the following exception:
Exception in thread "sbt-bg-threads-1" java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451) at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:103) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) at io.netty.buffer.PooledByteBufAllocatorL.<init>(PooledByteBufAllocatorL.java:49) at org.apache.arrow.memory.NettyAllocationManager.<clinit>(NettyAllocationManager.java:51) at org.apache.arrow.memory.DefaultAllocationManagerFactory.<clinit>(DefaultAllocationManagerFactory.java:26) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:315) at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108) at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98) at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772) at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24) at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83) at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:47) at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:24) at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485) at org.apache.arrow.memory.BaseAllocator.<clinit>(BaseAllocator.java:61) at org.apache.spark.sql.util.ArrowUtils$.<clinit>(ArrowUtils.scala:34) at org.apache.spark.sql.connect.client.arrow.ArrowVectorReader$.apply(ArrowVectorReader.scala:70) ...
Issue Cause
I have investigated this issue, and I suppose the following is the cause.
The "spark-connect-client-jvm" is a fat jar containing dependency classes, relocating some under the "org.sparkproject" package. Starting from version 3.5.1 (see SPARK-45371), it also includes classes from "spark-sql-api". As a result, references to relocated classes within "spark-sql-api" are updated accordingly. However, specifying "spark-sql-api" as an application dependency causes the classloader to load the original classes, leading to conflicts with relocated references.
In fact, in the stack trace, it tried to load io/netty/buffer/PooledByteBufAllocator, which must be org/sparkproject/io/netty/buffer/PooledByteBufAllocator.
How to resolve
To resolve this issue, removing the dependency on "spark-sql-api" in the application resolves the classloader conflict. I have verified this solution with versions 3.5.1, 3.5.2, and 3.5.3, and it works consistently.
We should fix the documentation by removing the "spark-sql-api" dependency so that users do not encounter the same issue.