Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23366

Improve hot reading path in ReadAheadInputStream

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      ReadAheadInputStream was introduced in apache/spark#18317 to optimize reading spill files from disk.
      However, investigating flamegraphs of profiles from investigating some regressed workloads after switch to Spark 2.3, it seems that the hot path of reading small amounts of data (like readInt) is inefficient - it involves taking locks, and multiple checks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                juliuszsompolski Juliusz Sompolski
                Reporter:
                juliuszsompolski Juliusz Sompolski
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: