Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.2, 2.0.1, 2.1.0
-
None
Description
There are known complaints/cribs about History Server's Application List not updating quickly enough when the event log files that need replay are huge. Currently, the FsHistoryProvider design causes the entire event log file to be replayed when building the initial application listing (refer the method mergeApplicationListing(fileStatus: FileStatus) ). The process of replay involves:
- each line in the event log being read as a string,
- parsing the string to a Json structure
- converting the Json to the corresponding Scala classes with nested structures
Particularly the part involving parsing string to Json and then to Scala classes is expensive. Tests show that majority of time spent in replay is in doing this work.
When the replay is performed for building the application listing, the only two events that the code really cares for are "SparkListenerApplicationStart" and "SparkListenerApplicationEnd" - since the only listener attached to the ReplayListenerBus at that point is the ApplicationEventListener. This means that when processing an event log file with a huge number (hundreds of thousands, can be more) of events, the work done to deserialize all of these event, and then replay them is not needed. Only two events are what we're interested in, and this can be used to ensure that when replay is performed for the purpose of building the application list, we only make the effort to replay these two events and not others.
My tests show that this drastically improves application list load time. For a 150MB event log from a user, with over 100,000 events, the load time (local on my mac) comes down from about 16 secs to under 1 second using this approach. For customers that typically execute applications with large event logs, and thus have multiple large event logs present, this can speed up how soon the history server UI lists the apps considerably.
I will be updating a pull request with take at fixing this.
Attachments
Issue Links
- is related to
-
SPARK-6951 History server slow startup if the event log directory is large
- Resolved
- links to