We need to predict the system performance prior to the actual execution, as the execution often takes a very long time to complete. This will enable the exploration of the different search spaces in a shorter period of time, to find a better solution within the search space
We can refer to EuroSys ’12: Jockey: Guaranteed Job Latency in Data Parallel Clusters as a related work.
Some of the related TODOs are as follows:
- Aggregating task metrics and historical data/traces
- A mechanism for classifying the tasks and the relevant metrics & configurations that contribute to the resulting performance of the task
- Utilizing our implementation of the event-based simulator (implemented as a scheduler) to integrate the task time prediction mechanism into the existing components
- Experiments to confirm the accuracy of the simulator