Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-1453

Spark Interpreter Isolation "scoped" - Classloading Issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 0.6.1
    • 0.6.2
    • None
    • None

    Description

      There seem to be classloader issues, when using Spark Interpreter with "scoped" Isolation, causing Issues like:

      1)
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4.0 failed 4 times, most recent failure: Lost task 1.3 in stage 4.0 (TID 25, 127.0.0.1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD_

      2)
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 40, 172.16.56.135): java.lang.ClassNotFoundException: $anonfun$1

      Can be reproduced with two Notebooks:

      Notebook A)
      Paragraph for Issue 1)
      val rdd = sc.parallelize(Array(Array("foo1", "foo2"), Array("bar1", "bar2")))
      val sorted_rdd = rdd.sortBy(_(1))

      Paragraph for Issue 2)
      val rdd = sc.parallelize(Array("foo", "bar"))
      val map = rdd.map(_ + "test")
      map.collect()

      Notebook B) (logically the same as A)
      Paragraph for Issue 1)
      val rdd2 = sc.parallelize(Array(Array("foo1", "foo2"), Array("bar1", "bar2")))
      val sorted_rdd2 = rdd2.sortBy(_(1))

      Paragraph for Issue 2)
      val rdd2 = sc.parallelize(Array("foo", "bar"))
      val map2 = rdd2.map(_ + "test")
      map.collect()

      When running both with "shared" >> ALL GOOD

      When running both with "scoped" >> Errors on the second Notebook.

      Full Stack traces
      1)
      rdd2: org.apache.spark.rdd.RDD[Array[String]] = ParallelCollectionRDD[16] at parallelize at <console>:27
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4.0 failed 4 times, most recent failure: Lost task 1.3 in stage 4.0 (TID 25, 172.16.56.135): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
      at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
      at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
      at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
      at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:497)
      at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      at org.apache.spark.scheduler.Task.run(Task.scala:85)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Driver stacktrace:
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
      at scala.Option.foreach(Option.scala:257)
      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
      at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
      at org.apache.spark.rdd.RDD.collect(RDD.scala:892)
      at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:264)
      at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:126)
      at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
      at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
      at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
      at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:596)
      at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:597)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
      at org.apache.spark.rdd.RDD.sortBy(RDD.scala:594)
      ... 46 elided
      Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
      at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
      at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1999)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
      at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
      at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:497)
      at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject
      0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      at org.apache.spark.scheduler.Task.run(Task.scala:85)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      ... 3 more

      2)
      rdd2: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[20] at parallelize at <console>:27
      map2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[21] at map at <console>:29
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 40, 172.16.56.135): java.lang.ClassNotFoundException: $anonfun$1
      at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:82)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Class.java:348)
      at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
      at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
      at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      at org.apache.spark.scheduler.Task.run(Task.scala:85)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ClassNotFoundException: $anonfun$1
      at java.lang.ClassLoader.findClass(ClassLoader.java:530)
      at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
      at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:77)
      ... 30 more
      Driver stacktrace:
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
      at scala.Option.foreach(Option.scala:257)
      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
      at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
      at org.apache.spark.rdd.RDD.collect(RDD.scala:892)
      ... 46 elided
      Caused by: java.lang.ClassNotFoundException: $anonfun$1
      at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:82)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at java.lang.Class.forName0(Native Method)
      at java.lang.Class.forName(Class.java:348)
      at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
      at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
      at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      at org.apache.spark.scheduler.Task.run(Task.scala:85)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      ... 3 more
      Caused by: java.lang.ClassNotFoundException: $anonfun$1
      at java.lang.ClassLoader.findClass(ClassLoader.java:530)
      at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
      at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:77)
      ... 30 more

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aweise Andreas Weise
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: