Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6977

PARSING_ERROR(2) in Spark Streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.3.0
    • None
    • DStreams
    • None

    Description

      I am using Spark streaming to read data from kafka,Five hours later the job is falied,and I found a log of Exception as follow:

      2015-04-17 16:35:16,797 INFO  [Driver] - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 1 in stage 7541923.0 failed 4 times, most recent failure: Lost task 1.3 in stage 7541923.0 (TID 105982, spark-host): java.io.IOException: PARSING_ERROR(2)
      	at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
      	at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
      	at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
      	at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
      	at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:387)
      	at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
      	at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
      	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
      	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
      	at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
      	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
      	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
      	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
      	at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
      	at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
      	at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:64)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:)
      
      2015-04-17 16:41:34,872 ERROR [sparkDriver-akka.actor.default-dispatcher-5] - Error running job streaming job 1429252575000 ms.0
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 54542.0 failed 4 times, most recent failure: Lost task 1.3 in stage 54542.0 (TID 31192, l-hdps37.com): java.io.IOException: PARSING_ERROR(2)
        at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
        at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
        at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
        at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
        at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:387)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
        at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:98)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:93)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      2015-04-17 16:41:34,872 INFO  [dag-scheduler-event-loop] - Registering RDD 1357 (filter at LogSplitStreamingKafka.scala:128)
      2015-04-17 16:41:34,874 INFO  [sparkDriver-akka.actor.default-dispatcher-3] - Slicing from 1429260075000 ms to 1429260075000 ms (aligned to 1429260075000 ms and 1429260075000 ms)
      2015-04-17 16:41:34,874 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 1 is 201 bytes
      2015-04-17 16:41:34,874 ERROR [Driver] - User class threw exception: Job aborted due to stage failure: Task 1 in stage 54542.0 failed 4 times, most recent failure: Lost task 1.3 in stage 54542.0 (TID 31192, l-hdps37.com): java.io.IOException: PARSING_ERROR(2)
        at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
        at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
        at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
        at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
        at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:387)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
        at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:98)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:93)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 54542.0 failed 4 times, most recent failure: Lost task 1.3 in stage 54542.0 (TID 31192, l-hdps37.com): java.io.IOException: PARSING_ERROR(2)
        at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
        at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
        at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
        at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
        at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:387)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
        at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:98)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:93)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      2015-04-17 16:41:34,875 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 0 is 188 bytes
      2015-04-17 16:41:34,875 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 6 is 198 bytes
      2015-04-17 16:41:34,875 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 5 is 180 bytes
      2015-04-17 16:41:34,876 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 11 is 201 bytes
      2015-04-17 16:41:34,876 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 10 is 186 bytes
      2015-04-17 16:41:34,876 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 16 is 199 bytes
      2015-04-17 16:41:34,876 INFO  [dag-scheduler-event-loop] - Size of output statuses for shuffle 15 is 186 bytes
      2015-04-17 16:41:34,876 INFO  [Driver] - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 1 in stage 54542.0 failed 4 times, most recent failure: Lost task 1.3 in stage 54542.0 (TID 31192, l-hdps37.com): java.io.IOException: PARSING_ERROR(2)
        at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
        at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
        at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
        at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:358)
        at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:387)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
        at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99)
        at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:98)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:93)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:)
      

      What is the reason, Can someone help me, Thank you!

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              397090770 iteblog
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: