Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
-
this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) and ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build)
Description
With Spark2.0.0-preview (May-24 build), the history event data (the json file), that is generated for each Spark application run (see below), can be as big as 5GB (instead of 14 MB for exactly the same application run and the same input data of 1TB under Spark1.6.1)
rwxrwx-- 1 root root 5.3G Jun 30 09:39 app-20160630091959-0000
rwxrwx-- 1 root root 5.3G Jun 30 09:56 app-20160630094213-0000
rwxrwx-- 1 root root 5.3G Jun 30 10:13 app-20160630095856-0000
rwxrwx-- 1 root root 5.3G Jun 30 10:30 app-20160630101556-0000
The test is done with Sparkbench V2, SQL RDD (see github: https://github.com/SparkTC/spark-bench)
Attachments
Issue Links
- duplicates
-
SPARK-20084 Remove internal.metrics.updatedBlockStatuses accumulator from history files
-
- Resolved
-
- is depended upon by
-
SPARK-19111 S3 Mesos history upload fails silently if too large
-
- Resolved
-
- is duplicated by
-
SPARK-16332 the history server of spark2.0-preview (may-24 build) consumes more than 1000% cpu
-
- Resolved
-
-
SPARK-19316 Spark event logs are huge compared to 1.5.2
-
- Resolved
-
- links to