Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5581

When writing sorted map output file, avoid open / close between each partition

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 2.1.0
    • Component/s: Shuffle
    • Labels:
      None

      Description

            // Bypassing merge-sort; get an iterator by partition and just write everything directly.
            for ((id, elements) <- this.partitionedIterator) {
              if (elements.hasNext) {
                val writer = blockManager.getDiskWriter(
                  blockId, outputFile, ser, fileBufferSize, context.taskMetrics.shuffleWriteMetrics.get)
                for (elem <- elements) {
                  writer.write(elem)
                }
                writer.commitAndClose()
                val segment = writer.fileSegment()
                lengths(id) = segment.length
              }
            }
      

        Attachments

          Activity

            People

            • Assignee:
              chobrian Brian Cho
              Reporter:
              sandyr Sandy Ryza
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: