Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25992

Accumulators giving KeyError in pyspark

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1
    • Fix Version/s: 2.4.1, 3.0.0
    • Component/s: PySpark
    • Labels:
      None

      Description

      I am using accumulators and when I run my code, I sometimes get some warn messages. When I checked, there was nothing accumulated - not sure if I lost info from the accumulator or it worked and I can ignore this error ?

      The message:

      Exception happened during processing of request from
      ('127.0.0.1', 62099)
      Traceback (most recent call last):
      File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 317, in _handle_request_noblock
          self.process_request(request, client_address)
      File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 348, in process_request
          self.finish_request(request, client_address)
      File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 361, in finish_request
          self.RequestHandlerClass(request, client_address, self)
      File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 696, in __init__
          self.handle()
      File "/usr/local/hadoop/spark2.3.1/python/pyspark/accumulators.py", line 238, in handle
          _accumulatorRegistry[aid] += update
      KeyError: 0
      ----------------------------------------
      2018-11-09 19:09:08 ERROR DAGScheduler:91 - Failed to update accumulators for task 0
      org.apache.spark.SparkException: EOF reached before Python server acknowledged
      	at org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:634)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1131)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1123)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      	at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1123)
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1206)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
      	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                AbdealiJK Abdeali Kothari
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: