[MAPREDUCE-2010] [Rumen] Parallelize TraceBuilder - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.22.0
Fix Version/s: None
Component/s: tools/rumen
Labels:
None

Description

Currently, Rumen's TraceBuilder processes jobs in sequential manner and emits them in sorted order (based on job-id). Following are the steps :

Read data from input files
Parse and analyze the JobHistory data
Write the data to the output file

Steps #1 and #2 can be done in parallel. Step #3 can be made sequential (if user needs it) else can also be done in parallel.

I could achieve ~50% speedup by simply parallelizing step#1 and step#2 (i.e output was sorted based on job-id).

Attachments

Activity

People

Assignee:: Amar Kamat

Reporter:: Amar Kamat

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Aug/10 11:20

Updated:: 13/Jan/11 02:29