Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
Description
Spark's event log JsonProtocol currently uses Json4s ASTs to generate and parse JSON. Performance overheads from Json4s account for a significant proportion of all time spent in JsonProtocol. If we replace Json4s usage with direct usage of Jackson APIs then we can significantly improve performance (~2x improvement for writing and reading in my own local microbenchmarks).
This performance improvement translates to faster history server load times and reduced load on the Spark driver (and reduced likelihood of dropping events because the listener cannot keep up, therefore reducing the likelihood of inconsistent Spark UIs).
Reducing our usage of Json4s is also a step towards being able to eventually remove our dependency on Json4s: Spark's current use of Json4s creates library conflicts for end users who want to adopt Json4s 4 (see discussion on PRs for SPARK-36408). If Spark can eventually remove its Json4s dependency then we will completely eliminate such conflicts.
Attachments
Issue Links
- causes
-
SPARK-42403 JsonProtocol should handle null JSON strings
- Resolved
- links to