Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.3
-
None
Description
It appears that there is a conflict in avro dependency versions at runtime when using Spark 2.4.3 and Scala 2.12 (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 2.7.7.
Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes avro-1.8.2.jar
$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro
jars/avro-1.8.2.jar
jars/avro-mapred-1.8.2-hadoop2.jar
jars/avro-ipc-1.8.2.jar
Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop does not
$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro
jars/avro-mapred-1.8.2-hadoop2.jar
Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which conflicts at runtime
$ find hadoop-2.7.7 -name *.jar | grep avro
share/hadoop/mapreduce/lib/avro-1.7.4.jar
share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar
share/hadoop/tools/lib/avro-1.7.4.jar
share/hadoop/common/lib/avro-1.7.4.jar
hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar
Issue filed downstream in
https://github.com/bigdatagenomics/adam/issues/2151
Attached a smaller reproducing test case.