Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23366

Improve hot reading path in ReadAheadInputStream

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • Spark Core
    • None

    Description

      ReadAheadInputStream was introduced in apache/spark#18317 to optimize reading spill files from disk.
      However, investigating flamegraphs of profiles from investigating some regressed workloads after switch to Spark 2.3, it seems that the hot path of reading small amounts of data (like readInt) is inefficient - it involves taking locks, and multiple checks.

      Attachments

        Issue Links

          Activity

            People

              juliuszsompolski Juliusz Sompolski
              juliuszsompolski Juliusz Sompolski
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: