Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1977

mutable.BitSet in ALS not serializable with KryoSerializer

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2, 1.1.0
    • Component/s: MLlib
    • Labels:
      None

      Description

      OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member.
      KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't register mutable.BitSet.

      Right now we have to register mutable.BitSet manually. A proper fix would be using immutable.BitSet in ALS or register mutable.BitSet in upstream chill.

      Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: scala.collection.mutable.HashSet
      Serialization trace:
      shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock)
              com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
              com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
              com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
              com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
              com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
              com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
              org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
              org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
              org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
              org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
              org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
              org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
              scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
              scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
              org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)
              org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
              org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
              org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
              org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
              org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
              org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
              org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
              org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
              org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
              org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
              org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
              org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
              org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
              org.apache.spark.scheduler.Task.run(Task.scala:51)
              org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
              java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              java.lang.Thread.run(Thread.java:662)
      Driver stacktrace:
      	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
      	at scala.Option.foreach(Option.scala:236)
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
      	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
      	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
      	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
      	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
      	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      

        Issue Links

          Activity

          Hide
          mengxr Xiangrui Meng added a comment -

          I cannot reproduce this error in v1.0.0. There is an example called `MovieLensALS.scala` under `examples/`, which runs fine with kryo enabled. Did you include other dependencies in your application?

          Show
          mengxr Xiangrui Meng added a comment - I cannot reproduce this error in v1.0.0. There is an example called `MovieLensALS.scala` under `examples/`, which runs fine with kryo enabled. Did you include other dependencies in your application?
          Hide
          sinisa_lyh Neville Li added a comment -

          Yeah that example worked fine for me in standalone mode but failed in YARN cluster mode with the same error.
          Maybe serialization wasn't needed/triggered in standalone mode?

          Show
          sinisa_lyh Neville Li added a comment - Yeah that example worked fine for me in standalone mode but failed in YARN cluster mode with the same error. Maybe serialization wasn't needed/triggered in standalone mode?
          Hide
          mengxr Xiangrui Meng added a comment -

          This is more likely a version conflict in your dependencies. From the Spark WebUI, you can find the system classpath in the environment tab. Please verify that you don't have two different versions of spark, kryo, or any other related library. Classes may hide inside an assembly jar.

          Show
          mengxr Xiangrui Meng added a comment - This is more likely a version conflict in your dependencies. From the Spark WebUI, you can find the system classpath in the environment tab. Please verify that you don't have two different versions of spark, kryo, or any other related library. Classes may hide inside an assembly jar.
          Hide
          sinisa_lyh Neville Li added a comment - - edited

          We submit 1 spark-assembly and 1 job assembly jar via spark-submit and there are no other obvious scala/spark/kryo jars in the global classpath. I can reproduce the same exception locally with the following snippet, when kryo.register() is commented out.

          I just added mutable BitSet to Twitter chill: https://github.com/twitter/chill/pull/185

          import com.twitter.chill._
          import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator}
          import org.apache.spark.SparkConf
          import scala.collection.mutable
          
          class MyRegistrator extends KryoRegistrator {
            override def registerClasses(kryo: Kryo) {
              // kryo.register(classOf[mutable.BitSet])
            }
          }
          
          case class OutLinkBlock(elementIds: Array[Int], shouldSend: Array[mutable.BitSet])
          
          object KryoTest {
            def main(args: Array[String]) {
              println("hello")
              val conf = new SparkConf()
                .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                .set("spark.kryo.registrator", classOf[MyRegistrator].getName)
              val serializer = new KryoSerializer(conf).newInstance()
          
              val bytes = serializer.serialize(OutLinkBlock(Array(1, 2, 3), Array(mutable.BitSet(2, 4, 6))))
              serializer.deserialize(bytes).asInstanceOf[OutLinkBlock]
            }
          }
          
          Show
          sinisa_lyh Neville Li added a comment - - edited We submit 1 spark-assembly and 1 job assembly jar via spark-submit and there are no other obvious scala/spark/kryo jars in the global classpath. I can reproduce the same exception locally with the following snippet, when kryo.register() is commented out. I just added mutable BitSet to Twitter chill: https://github.com/twitter/chill/pull/185 import com.twitter.chill._ import org.apache.spark.serializer.{KryoSerializer, KryoRegistrator} import org.apache.spark.SparkConf import scala.collection.mutable class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { // kryo.register(classOf[mutable.BitSet]) } } case class OutLinkBlock(elementIds: Array[Int], shouldSend: Array[mutable.BitSet]) object KryoTest { def main(args: Array[ String ]) { println( "hello" ) val conf = new SparkConf() .set( "spark.serializer" , "org.apache.spark.serializer.KryoSerializer" ) .set( "spark.kryo.registrator" , classOf[MyRegistrator].getName) val serializer = new KryoSerializer(conf).newInstance() val bytes = serializer.serialize(OutLinkBlock(Array(1, 2, 3), Array(mutable.BitSet(2, 4, 6)))) serializer.deserialize(bytes).asInstanceOf[OutLinkBlock] } }
          Hide
          mengxr Xiangrui Meng added a comment -

          In our example code, we only register `Rating` and it works. Could you try adding the following:

          kryo.register(classOf[Rating])
          

          I need to reproduce this problem with `ALS.train`.

          Show
          mengxr Xiangrui Meng added a comment - In our example code, we only register `Rating` and it works. Could you try adding the following: kryo.register(classOf[Rating]) I need to reproduce this problem with `ALS.train`.
          Hide
          sinisa_lyh Neville Li added a comment -

          We are already doing that
          Our job works on YARN with "register(classOf[mutable.BitSet])". Without it we get the reported exception.

          Show
          sinisa_lyh Neville Li added a comment - We are already doing that Our job works on YARN with "register(classOf [mutable.BitSet] )". Without it we get the reported exception.
          Hide
          mengxr Xiangrui Meng added a comment -

          Did you register `Rating`? I think this is necessary.

          Show
          mengxr Xiangrui Meng added a comment - Did you register `Rating`? I think this is necessary.
          Hide
          sinisa_lyh Neville Li added a comment -

          Yes we did register 'Rating'. And we had to "register(classOf[mutable.BitSet])" in addition to make it work.

          Show
          sinisa_lyh Neville Li added a comment - Yes we did register 'Rating'. And we had to "register(classOf [mutable.BitSet] )" in addition to make it work.
          Hide
          coderxiang Shuo Xiang added a comment -

          Hi Neville Li, I just run the MovieLens example on my YARN cluster (hadoop-2.0.5-alpha) with kryo enabled and it works. I use the following command:

          bin/spark-submit --master yarn-cluster --class org.apache.spark.examples.mllib.MovieLensALS --num-executors ** --driver-memory ** --executor-memory ** --executor-cores 1 spark-examples-1.0.0-hadoop2.0.5-alpha.jar --rank 5 --numIterations 20 --lambda 1.0 --kryo /path/to/sample_movielens_data.txt

          Show
          coderxiang Shuo Xiang added a comment - Hi Neville Li , I just run the MovieLens example on my YARN cluster (hadoop-2.0.5-alpha) with kryo enabled and it works. I use the following command: bin/spark-submit --master yarn-cluster --class org.apache.spark.examples.mllib.MovieLensALS --num-executors ** --driver-memory ** --executor-memory ** --executor-cores 1 spark-examples-1.0.0-hadoop2.0.5-alpha.jar --rank 5 --numIterations 20 --lambda 1.0 --kryo /path/to/sample_movielens_data.txt
          Hide
          sinisa_lyh Neville Li added a comment -

          Our YARN cluster runs 2.2.0. We built spark-assembly and spark-examples jars with 1.0.0 release source and the bundled make_distribution.sh. And here's my command:

          spark-submit --master yarn-cluster --class org.apache.spark.examples.mllib.MovieLensALS --num-executors 2 --executor-memory 2g --driver-memory 2g dist/lib/spark-examples-1.0.0-hadoop2.2.0.jar --kryo --implicitPrefs sample_movielens_data.txt
          

          Here's a complete list of classpath from the environment tab.

          /etc/hadoop/conf
          /usr/lib/hadoop-hdfs/hadoop-hdfs-2.2.0.2.0.6.0-76-tests.jar
          /usr/lib/hadoop-hdfs/hadoop-hdfs-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-hdfs/lib/asm-3.2.jar
          /usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar
          /usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar
          /usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.13.jar
          /usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar
          /usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar
          /usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar
          /usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar
          /usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar
          /usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar
          /usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar
          /usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar
          /usr/lib/hadoop-hdfs/lib/jersey-core-1.9.jar
          /usr/lib/hadoop-hdfs/lib/jersey-server-1.9.jar
          /usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar
          /usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar
          /usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar
          /usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar
          /usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar
          /usr/lib/hadoop-hdfs/lib/netty-3.6.2.Final.jar
          /usr/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar
          /usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar
          /usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar
          /usr/lib/hadoop-mapreduce/hadoop-archives-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-datajoin-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-distcp-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-extras-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-gridmix-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-hs-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76-tests.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-rumen-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar
          /usr/lib/hadoop-mapreduce/lib/asm-3.2.jar
          /usr/lib/hadoop-mapreduce/lib/avro-1.7.4.jar
          /usr/lib/hadoop-mapreduce/lib/commons-compress-1.4.1.jar
          /usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar
          /usr/lib/hadoop-mapreduce/lib/guice-3.0.jar
          /usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar
          /usr/lib/hadoop-mapreduce/lib/hamcrest-core-1.1.jar
          /usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar
          /usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar
          /usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar
          /usr/lib/hadoop-mapreduce/lib/jersey-core-1.9.jar
          /usr/lib/hadoop-mapreduce/lib/jersey-guice-1.9.jar
          /usr/lib/hadoop-mapreduce/lib/jersey-server-1.9.jar
          /usr/lib/hadoop-mapreduce/lib/junit-4.10.jar
          /usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar
          /usr/lib/hadoop-mapreduce/lib/netty-3.6.2.Final.jar
          /usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar
          /usr/lib/hadoop-mapreduce/lib/protobuf-java-2.5.0.jar
          /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.4.1.jar
          /usr/lib/hadoop-mapreduce/lib/xz-1.0.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-api-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-client-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-common-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-server-nodemanager-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-server-tests-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/hadoop-yarn-site-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar
          /usr/lib/hadoop-yarn/lib/asm-3.2.jar
          /usr/lib/hadoop-yarn/lib/avro-1.7.4.jar
          /usr/lib/hadoop-yarn/lib/commons-compress-1.4.1.jar
          /usr/lib/hadoop-yarn/lib/commons-io-2.1.jar
          /usr/lib/hadoop-yarn/lib/guice-3.0.jar
          /usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar
          /usr/lib/hadoop-yarn/lib/hamcrest-core-1.1.jar
          /usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar
          /usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar
          /usr/lib/hadoop-yarn/lib/javax.inject-1.jar
          /usr/lib/hadoop-yarn/lib/jersey-core-1.9.jar
          /usr/lib/hadoop-yarn/lib/jersey-guice-1.9.jar
          /usr/lib/hadoop-yarn/lib/jersey-server-1.9.jar
          /usr/lib/hadoop-yarn/lib/junit-4.10.jar
          /usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar
          /usr/lib/hadoop-yarn/lib/netty-3.6.2.Final.jar
          /usr/lib/hadoop-yarn/lib/paranamer-2.3.jar
          /usr/lib/hadoop-yarn/lib/protobuf-java-2.5.0.jar
          /usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar
          /usr/lib/hadoop-yarn/lib/xz-1.0.jar
          /usr/lib/hadoop/hadoop-annotations-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop/hadoop-auth-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-76-tests.jar
          /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop/hadoop-nfs-2.2.0.2.0.6.0-76.jar
          /usr/lib/hadoop/lib/activation-1.1.jar
          /usr/lib/hadoop/lib/asm-3.2.jar
          /usr/lib/hadoop/lib/avro-1.7.4.jar
          /usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar
          /usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar
          /usr/lib/hadoop/lib/commons-cli-1.2.jar
          /usr/lib/hadoop/lib/commons-codec-1.4.jar
          /usr/lib/hadoop/lib/commons-collections-3.2.1.jar
          /usr/lib/hadoop/lib/commons-compress-1.4.1.jar
          /usr/lib/hadoop/lib/commons-configuration-1.6.jar
          /usr/lib/hadoop/lib/commons-digester-1.8.jar
          /usr/lib/hadoop/lib/commons-el-1.0.jar
          /usr/lib/hadoop/lib/commons-httpclient-3.1.jar
          /usr/lib/hadoop/lib/commons-io-2.1.jar
          /usr/lib/hadoop/lib/commons-lang-2.5.jar
          /usr/lib/hadoop/lib/commons-logging-1.1.1.jar
          /usr/lib/hadoop/lib/commons-math-2.1.jar
          /usr/lib/hadoop/lib/commons-net-3.1.jar
          /usr/lib/hadoop/lib/guava-11.0.2.jar
          /usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar
          /usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar
          /usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar
          /usr/lib/hadoop/lib/jackson-xc-1.8.8.jar
          /usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar
          /usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar
          /usr/lib/hadoop/lib/jaxb-api-2.2.2.jar
          /usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar
          /usr/lib/hadoop/lib/jersey-core-1.9.jar
          /usr/lib/hadoop/lib/jersey-json-1.9.jar
          /usr/lib/hadoop/lib/jersey-server-1.9.jar
          /usr/lib/hadoop/lib/jets3t-0.6.1.jar
          /usr/lib/hadoop/lib/jettison-1.1.jar
          /usr/lib/hadoop/lib/jetty-6.1.26.jar
          /usr/lib/hadoop/lib/jetty-util-6.1.26.jar
          /usr/lib/hadoop/lib/jsch-0.1.42.jar
          /usr/lib/hadoop/lib/jsp-api-2.1.jar
          /usr/lib/hadoop/lib/jsr305-1.3.9.jar
          /usr/lib/hadoop/lib/junit-4.8.2.jar
          /usr/lib/hadoop/lib/log4j-1.2.17.jar
          /usr/lib/hadoop/lib/mockito-all-1.8.5.jar
          /usr/lib/hadoop/lib/native/*
          /usr/lib/hadoop/lib/netty-3.6.2.Final.jar
          /usr/lib/hadoop/lib/paranamer-2.3.jar
          /usr/lib/hadoop/lib/protobuf-java-2.5.0.jar
          /usr/lib/hadoop/lib/servlet-api-2.5.jar
          /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar
          /usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar
          /usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar
          /usr/lib/hadoop/lib/stax-api-1.0.1.jar
          /usr/lib/hadoop/lib/xmlenc-0.52.jar
          /usr/lib/hadoop/lib/xz-1.0.jar
          /usr/lib/hadoop/lib/zookeeper-3.4.5.jar
          
          Show
          sinisa_lyh Neville Li added a comment - Our YARN cluster runs 2.2.0. We built spark-assembly and spark-examples jars with 1.0.0 release source and the bundled make_distribution.sh. And here's my command: spark-submit --master yarn-cluster --class org.apache.spark.examples.mllib.MovieLensALS --num-executors 2 --executor-memory 2g --driver-memory 2g dist/lib/spark-examples-1.0.0-hadoop2.2.0.jar --kryo --implicitPrefs sample_movielens_data.txt Here's a complete list of classpath from the environment tab. /etc/hadoop/conf /usr/lib/hadoop-hdfs/hadoop-hdfs-2.2.0.2.0.6.0-76-tests.jar /usr/lib/hadoop-hdfs/hadoop-hdfs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-hdfs/hadoop-hdfs-nfs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-hdfs/lib/asm-3.2.jar /usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar /usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar /usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.13.jar /usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar /usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar /usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar /usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar /usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar /usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar /usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar /usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar /usr/lib/hadoop-hdfs/lib/jersey-core-1.9.jar /usr/lib/hadoop-hdfs/lib/jersey-server-1.9.jar /usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar /usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar /usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar /usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar /usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar /usr/lib/hadoop-hdfs/lib/netty-3.6.2.Final.jar /usr/lib/hadoop-hdfs/lib/protobuf-java-2.5.0.jar /usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar /usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar /usr/lib/hadoop-mapreduce/hadoop-archives-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-datajoin-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-distcp-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-extras-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-gridmix-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-hs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76-tests.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-rumen-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar /usr/lib/hadoop-mapreduce/lib/asm-3.2.jar /usr/lib/hadoop-mapreduce/lib/avro-1.7.4.jar /usr/lib/hadoop-mapreduce/lib/commons-compress-1.4.1.jar /usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar /usr/lib/hadoop-mapreduce/lib/guice-3.0.jar /usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar /usr/lib/hadoop-mapreduce/lib/hamcrest-core-1.1.jar /usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar /usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar /usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar /usr/lib/hadoop-mapreduce/lib/jersey-core-1.9.jar /usr/lib/hadoop-mapreduce/lib/jersey-guice-1.9.jar /usr/lib/hadoop-mapreduce/lib/jersey-server-1.9.jar /usr/lib/hadoop-mapreduce/lib/junit-4.10.jar /usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar /usr/lib/hadoop-mapreduce/lib/netty-3.6.2.Final.jar /usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar /usr/lib/hadoop-mapreduce/lib/protobuf-java-2.5.0.jar /usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.4.1.jar /usr/lib/hadoop-mapreduce/lib/xz-1.0.jar /usr/lib/hadoop-yarn/hadoop-yarn-api-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-client-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-nodemanager-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-tests-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-server-web-proxy-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/hadoop-yarn-site-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar /usr/lib/hadoop-yarn/lib/asm-3.2.jar /usr/lib/hadoop-yarn/lib/avro-1.7.4.jar /usr/lib/hadoop-yarn/lib/commons-compress-1.4.1.jar /usr/lib/hadoop-yarn/lib/commons-io-2.1.jar /usr/lib/hadoop-yarn/lib/guice-3.0.jar /usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar /usr/lib/hadoop-yarn/lib/hamcrest-core-1.1.jar /usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar /usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar /usr/lib/hadoop-yarn/lib/javax.inject-1.jar /usr/lib/hadoop-yarn/lib/jersey-core-1.9.jar /usr/lib/hadoop-yarn/lib/jersey-guice-1.9.jar /usr/lib/hadoop-yarn/lib/jersey-server-1.9.jar /usr/lib/hadoop-yarn/lib/junit-4.10.jar /usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar /usr/lib/hadoop-yarn/lib/netty-3.6.2.Final.jar /usr/lib/hadoop-yarn/lib/paranamer-2.3.jar /usr/lib/hadoop-yarn/lib/protobuf-java-2.5.0.jar /usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar /usr/lib/hadoop-yarn/lib/xz-1.0.jar /usr/lib/hadoop/hadoop-annotations-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop/hadoop-auth-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-76-tests.jar /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop/hadoop-nfs-2.2.0.2.0.6.0-76.jar /usr/lib/hadoop/lib/activation-1.1.jar /usr/lib/hadoop/lib/asm-3.2.jar /usr/lib/hadoop/lib/avro-1.7.4.jar /usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar /usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar /usr/lib/hadoop/lib/commons-cli-1.2.jar /usr/lib/hadoop/lib/commons-codec-1.4.jar /usr/lib/hadoop/lib/commons-collections-3.2.1.jar /usr/lib/hadoop/lib/commons-compress-1.4.1.jar /usr/lib/hadoop/lib/commons-configuration-1.6.jar /usr/lib/hadoop/lib/commons-digester-1.8.jar /usr/lib/hadoop/lib/commons-el-1.0.jar /usr/lib/hadoop/lib/commons-httpclient-3.1.jar /usr/lib/hadoop/lib/commons-io-2.1.jar /usr/lib/hadoop/lib/commons-lang-2.5.jar /usr/lib/hadoop/lib/commons-logging-1.1.1.jar /usr/lib/hadoop/lib/commons-math-2.1.jar /usr/lib/hadoop/lib/commons-net-3.1.jar /usr/lib/hadoop/lib/guava-11.0.2.jar /usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar /usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar /usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar /usr/lib/hadoop/lib/jackson-xc-1.8.8.jar /usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar /usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar /usr/lib/hadoop/lib/jaxb-api-2.2.2.jar /usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar /usr/lib/hadoop/lib/jersey-core-1.9.jar /usr/lib/hadoop/lib/jersey-json-1.9.jar /usr/lib/hadoop/lib/jersey-server-1.9.jar /usr/lib/hadoop/lib/jets3t-0.6.1.jar /usr/lib/hadoop/lib/jettison-1.1.jar /usr/lib/hadoop/lib/jetty-6.1.26.jar /usr/lib/hadoop/lib/jetty-util-6.1.26.jar /usr/lib/hadoop/lib/jsch-0.1.42.jar /usr/lib/hadoop/lib/jsp-api-2.1.jar /usr/lib/hadoop/lib/jsr305-1.3.9.jar /usr/lib/hadoop/lib/junit-4.8.2.jar /usr/lib/hadoop/lib/log4j-1.2.17.jar /usr/lib/hadoop/lib/mockito-all-1.8.5.jar /usr/lib/hadoop/lib/ native /* /usr/lib/hadoop/lib/netty-3.6.2.Final.jar /usr/lib/hadoop/lib/paranamer-2.3.jar /usr/lib/hadoop/lib/protobuf-java-2.5.0.jar /usr/lib/hadoop/lib/servlet-api-2.5.jar /usr/lib/hadoop/lib/slf4j-api-1.7.5.jar /usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar /usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar /usr/lib/hadoop/lib/stax-api-1.0.1.jar /usr/lib/hadoop/lib/xmlenc-0.52.jar /usr/lib/hadoop/lib/xz-1.0.jar /usr/lib/hadoop/lib/zookeeper-3.4.5.jar
          Hide
          smolav Santiago M. Mola added a comment - - edited

          I can reproduce this depending on the size of the dataset:

          spark-submit mllib-movielens-evaluation-assembly-1.0.jar --master spark://mllib1:7077
          --class com.example.MovieLensALS --rank 10 --numIterations 20 --lambda 1.0 --kryo
          hdfs:/movielens/oversampled.dat
          

          The exception will not be thrown for small datasets. It will successfully run with MovieLens 100k and 10M. However, when I run it on a 100M dataset, the exception will be thrown.

          My MovieLensALS is mostly the same as the one shipped with Spark. I just added cross-validation. Rating is registered in Kryo just as in the stock example.

          # cat RELEASE 
          Spark 1.0.0 built for Hadoop 2.2.0
          
          Show
          smolav Santiago M. Mola added a comment - - edited I can reproduce this depending on the size of the dataset: spark-submit mllib-movielens-evaluation-assembly-1.0.jar --master spark://mllib1:7077 --class com.example.MovieLensALS --rank 10 --numIterations 20 --lambda 1.0 --kryo hdfs:/movielens/oversampled.dat The exception will not be thrown for small datasets. It will successfully run with MovieLens 100k and 10M. However, when I run it on a 100M dataset, the exception will be thrown. My MovieLensALS is mostly the same as the one shipped with Spark. I just added cross-validation. Rating is registered in Kryo just as in the stock example. # cat RELEASE Spark 1.0.0 built for Hadoop 2.2.0
          Hide
          coderxiang Shuo Xiang added a comment -

          Update: I also reproduce similar error message for a larger data set (~ 3GB).

          Show
          coderxiang Shuo Xiang added a comment - Update: I also reproduce similar error message for a larger data set (~ 3GB).
          Hide
          mengxr Xiangrui Meng added a comment -

          Santiago M. Mola and Shuo Xiang:

          Thanks for testing it! Could you post the exact error message you got with stack trace? Based on your description, it should be caused by the default serialization of kryo. It may treat BitSet as a general Java collection, then run into error in ser/de.

          Show
          mengxr Xiangrui Meng added a comment - Santiago M. Mola and Shuo Xiang : Thanks for testing it! Could you post the exact error message you got with stack trace? Based on your description, it should be caused by the default serialization of kryo. It may treat BitSet as a general Java collection, then run into error in ser/de.
          Hide
          smolav Santiago M. Mola added a comment -

          Xiangrui Meng, I can't reproduce it at the moment. It takes a quite big dataset to reproduce and I have my machines busy. But I'm pretty sure the stacktrace is exactly the same as the one posted by Neville Li. My bet is that this will be fixed with next Twitter Chill release: https://github.com/twitter/chill/commit/b47512c2c75b94b7c5945985306fa303576bf90d

          Show
          smolav Santiago M. Mola added a comment - Xiangrui Meng, I can't reproduce it at the moment. It takes a quite big dataset to reproduce and I have my machines busy. But I'm pretty sure the stacktrace is exactly the same as the one posted by Neville Li. My bet is that this will be fixed with next Twitter Chill release: https://github.com/twitter/chill/commit/b47512c2c75b94b7c5945985306fa303576bf90d
          Hide
          mengxr Xiangrui Meng added a comment -

          I think now I understand when it happens. We use storage level MEMORY_AND_DISK for user/product in/out links, which contains BitSet objects. If the dataset is large, these RDDs will be pushed from in memory storage to on disk storage, where the latter requires serialization. So the easiest way to re-produce this error is changing the storage level of inLinks/outLinks to DISK_ONLY and run with kryo.

          Neville Li Instead of mapping mutable.BitSet to immutable.BitSet, which introduces overhead, we can register mutable.BitSet in our MovieLensALS example code and wait for the next Chill release. Does it sound good to you?

          Show
          mengxr Xiangrui Meng added a comment - I think now I understand when it happens. We use storage level MEMORY_AND_DISK for user/product in/out links, which contains BitSet objects. If the dataset is large, these RDDs will be pushed from in memory storage to on disk storage, where the latter requires serialization. So the easiest way to re-produce this error is changing the storage level of inLinks/outLinks to DISK_ONLY and run with kryo. Neville Li Instead of mapping mutable.BitSet to immutable.BitSet, which introduces overhead, we can register mutable.BitSet in our MovieLensALS example code and wait for the next Chill release. Does it sound good to you?
          Hide
          sinisa_lyh Neville Li added a comment -

          Xiangrui Meng sounds good to me.

          Show
          sinisa_lyh Neville Li added a comment - Xiangrui Meng sounds good to me.
          Hide
          mengxr Xiangrui Meng added a comment -

          Do you mind creating a PR registering mutable.BitSet in MovieLensALS.scala and close PR #925? Thanks!

          Show
          mengxr Xiangrui Meng added a comment - Do you mind creating a PR registering mutable.BitSet in MovieLensALS.scala and close PR #925? Thanks!
          Hide
          sinisa_lyh Neville Li added a comment -
          Show
          sinisa_lyh Neville Li added a comment - There you go: https://github.com/apache/spark/pull/1319
          Hide
          mengxr Xiangrui Meng added a comment -

          Issue resolved by pull request 1319
          https://github.com/apache/spark/pull/1319

          Show
          mengxr Xiangrui Meng added a comment - Issue resolved by pull request 1319 https://github.com/apache/spark/pull/1319
          Hide
          gen Gen TANG added a comment -

          Neville Li
          Sorry to bother you.
          According to https://github.com/twitter/chill/pull/185, the twitter.chill have already had the support of mutable BitSet. However, I tried your code, it still doesn't work, if we make kryo as a comment. The task fails in the last line:

          serializer.deserialize(bytes).asInstanceOf[OutLinkBlock]
          

          Have you any ideas how it happens? The error information is as follow:

          [error] (run-main) com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet
          [error] Serialization trace:
          [error] elementIds (OutLinkBlock)
          com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet
          Serialization trace:
          elementIds (OutLinkBlock)
          	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
          	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
          	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
          	at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
          	at KroTest$.main(helloworld.scala:25)
          	at KroTest.main(helloworld.scala)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:622)
          Caused by: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet
          	at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:164)
          	at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:168)
          	at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.set(UnsafeQualifiedObjectFieldAccessorImpl.java:83)
          	at java.lang.reflect.Field.set(Field.java:736)
          	at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:619)
          	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
          	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
          	at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
          	at KroTest$.main(helloworld.scala:25)
          	at KroTest.main(helloworld.scala)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:622)
          
          
          Show
          gen Gen TANG added a comment - Neville Li Sorry to bother you. According to https://github.com/twitter/chill/pull/185 , the twitter.chill have already had the support of mutable BitSet. However, I tried your code, it still doesn't work, if we make kryo as a comment. The task fails in the last line: serializer.deserialize(bytes).asInstanceOf[OutLinkBlock] Have you any ideas how it happens? The error information is as follow: [error] (run-main) com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet [error] Serialization trace: [error] elementIds (OutLinkBlock) com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet Serialization trace: elementIds (OutLinkBlock) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at KroTest$.main(helloworld.scala:25) at KroTest.main(helloworld.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) Caused by: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:164) at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:168) at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.set(UnsafeQualifiedObjectFieldAccessorImpl.java:83) at java.lang.reflect.Field.set(Field.java:736) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:619) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162) at KroTest$.main(helloworld.scala:25) at KroTest.main(helloworld.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622)
          Hide
          gen Gen TANG added a comment -

          In fact, the problem about transformation between HashSet and BitSet in Kyro happens, if we don't register BitSet manually.

          com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet
          

          This will also cause the collapse of spark when we use spark HIVE.

          Show
          gen Gen TANG added a comment - In fact, the problem about transformation between HashSet and BitSet in Kyro happens, if we don't register BitSet manually. com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final scala.collection.mutable.BitSet field OutLinkBlock.elementIds to scala.collection.mutable.HashSet This will also cause the collapse of spark when we use spark HIVE.
          Hide
          ilganeli Ilya Ganelin added a comment -
          Show
          ilganeli Ilya Ganelin added a comment - Hi all - concise writeup on how to fix this bug here: http://tbertinmahieux.com/wp/?author=1 Also related to: http://apache-spark-user-list.1001560.n3.nabble.com/ALS-implicit-error-pyspark-td16595.html Thanks.
          Hide
          apachespark Apache Spark added a comment -

          User 'nevillelyh' has created a pull request for this issue:
          https://github.com/apache/spark/pull/925

          Show
          apachespark Apache Spark added a comment - User 'nevillelyh' has created a pull request for this issue: https://github.com/apache/spark/pull/925

            People

            • Assignee:
              Unassigned
              Reporter:
              sinisa_lyh Neville Li
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development