[SPARK-708] add a JobLogger for Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.7.0
Fix Version/s: 0.8.0
Component/s: Spark Core
Labels:
None

Description

Current spark logs are outputted to console or one spark log file, which is not convenient for analysis of one single job. We would like to implement a JobLogger for Spark which output one history file for each job(ActiveJob). now the Spark has task metrics and summaries. the history file can be built on top of them.

the job history contains:
1.additinal information from outside. for example: query plan from Shark
2.RDD graph for the job.
3.task's start/stop and shuffle information
4.stage information

a new class named JobLogger does this job:
1.each SparkContext has one JobLogger, and one folder is created for every JobLogger
2.JobLogger manages all history files of activeJobs running in that SparkCOntext, create one history file for each activeJob, and the file name is the jobID
3.JobLogger generate job history and outputted it into the history file

Job history generation:
1.additional information from outside
For example: to get queryplan from Shark, the interface between shark and spark would be modified to pass the information from Shark to Spark.
2.record RDD graph for each Job
The RDD graph is printed using a top-down approach, the RDD dependencies are outputted recursively from finalRDD, and the parent-child relationship is represented by indent.
3.task's start/stop and shuffle information
can be gotten from TaskMetrics and TaskSetManager
4.stage information
can be gotten from StageInfo and DAGScheduler

Attachments

Activity

People

Assignee:: mingfei.shi

Reporter:: mingfei.shi

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Mar/13 21:59

Updated:: 28/Jul/13 16:01

Resolved:: 28/Jul/13 16:01