Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16333

Excessive Spark history event/json data size (5GB each)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Environment:

      this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) and ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build)

      Description

      With Spark2.0.0-preview (May-24 build), the history event data (the json file), that is generated for each Spark application run (see below), can be as big as 5GB (instead of 14 MB for exactly the same application run and the same input data of 1TB under Spark1.6.1)

      rwxrwx-- 1 root root 5.3G Jun 30 09:39 app-20160630091959-0000
      rwxrwx-- 1 root root 5.3G Jun 30 09:56 app-20160630094213-0000
      rwxrwx-- 1 root root 5.3G Jun 30 10:13 app-20160630095856-0000
      rwxrwx-- 1 root root 5.3G Jun 30 10:30 app-20160630101556-0000

      The test is done with Sparkbench V2, SQL RDD (see github: https://github.com/SparkTC/spark-bench)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                petergangliu Peter Liu
              • Votes:
                3 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: