Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7413

Time to write shuffle spill files is not captured in ShuffleWriteMetrics

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Shuffle
    • Labels:

      Description

      In ExternalSorter's spillToMergeableFile() method, we pass ShuffleWriteMetrics instances to the disk writers, but discard the shuffleWriteTime metrics captured here. I think that we should account for this IO time, possibly by introducing new metrics to distinguish time spent writing spills vs. writing final shuffle output and extending the UI to break down the overall IO write time in terms of these two components.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: