Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11022

Spark Worker need improve the executor garbage while the app has massive failures

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 1.4.0
    • None
    • Spark Core

    Description

      Worker process often down,while there were not any abnormal tasks,just crash without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. "
      and almost all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner.

      The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after receive many executors status report,may cause crash,I think this need improved.
      tks~ & best regards

      Attachments

        Activity

          People

            Unassigned Unassigned
            colin_Shaw colin shaw
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: