Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5581

When writing sorted map output file, avoid open / close between each partition

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 2.1.0
    • Shuffle, Spark Core
    • None

    Description

            // Bypassing merge-sort; get an iterator by partition and just write everything directly.
            for ((id, elements) <- this.partitionedIterator) {
              if (elements.hasNext) {
                val writer = blockManager.getDiskWriter(
                  blockId, outputFile, ser, fileBufferSize, context.taskMetrics.shuffleWriteMetrics.get)
                for (elem <- elements) {
                  writer.write(elem)
                }
                writer.commitAndClose()
                val segment = writer.fileSegment()
                lengths(id) = segment.length
              }
            }
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chobrian Brian Cho
            sandyr Sandy Ryza
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment