Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
1.5.1
-
None
-
None
Description
A Guava classloading error can occur when using a different version of the Hive metastore.
Running the latest version of Spark at this time (1.5.1) and patched versions of Hadoop 2.2.0 and Hive 1.0.0. We set "spark.sql.hive.metastore.version" to "1.0.0" and "spark.sql.hive.metastore.jars" to "<path_to_hive>/lib/*:<output_of_hadoop_classpath_cmd>". When trying to launch the spark-shell, the sqlContext would fail to initialize with:
java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: com/google/common/base/Predicate when creating Hive client using classpath: <all the jars>
Please make sure that jars for your version of hive and hadoop are included in the paths passed to SQLConfEntry(key = spark.sql.hive.metastore.jars, defaultValue=builtin, doc=...
We verified the Guava libraries are in the huge list of the included jars, but we saw that in the org.apache.spark.sql.hive.client.IsolatedClientLoader.isSharedClass method it seems to assume that all "com.google" (excluding "com.google.cloud") classes should be loaded from the base class loader. The Spark libraries seem to have some "com.google.common.base" classes shaded in but not all.
See https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCAB51Vx4ipV34e=EiSHLg7BZLdm0uefD_MpyqfE4dodbnbv9MKg@mail.gmail.com%3E and its replies.
The work-around is to add the guava JAR to the "spark.driver.extraClassPath" and "spark.executor.extraClassPath" properties.