Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.22.0
-
None
-
None
Description
Currently, Rumen's TraceBuilder processes jobs in sequential manner and emits them in sorted order (based on job-id). Following are the steps :
- Read data from input files
- Parse and analyze the JobHistory data
- Write the data to the output file
Steps #1 and #2 can be done in parallel. Step #3 can be made sequential (if user needs it) else can also be done in parallel.
I could achieve ~50% speedup by simply parallelizing step#1 and step#2 (i.e output was sorted based on job-id).