Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39489

Improve EventLoggingListener and ReplayListener performance by replacing Json4S ASTs with Jackson trees

    XMLWordPrintableJSON

Details

    Description

      Spark's event log JsonProtocol currently uses Json4s ASTs to generate and parse JSON. Performance overheads from Json4s account for a significant proportion of all time spent in JsonProtocol. If we replace Json4s usage with direct usage of Jackson APIs then we can significantly improve performance (~2x improvement for writing and reading in my own local microbenchmarks).

      This performance improvement translates to faster history server load times and reduced load on the Spark driver (and reduced likelihood of dropping events because the listener cannot keep up, therefore reducing the likelihood of inconsistent Spark UIs).

      Reducing our usage of Json4s is also a step towards being able to eventually remove our dependency on Json4s: Spark's current use of Json4s creates library conflicts for end users who want to adopt Json4s 4 (see discussion on PRs for SPARK-36408). If Spark can eventually remove its Json4s dependency then we will completely eliminate such conflicts.

      Attachments

        Issue Links

          Activity

            People

              joshrosen Josh Rosen
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: