Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-172

Implement one partition per one element partitioner

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.1

    Description

      We need to implement a partitioner which assigns a single partition key to each element.

      This partitioner is needed because every single partition received by RelayTransform is a (compressed) partition in our large shuffle optimization.
      1) If we make every element to single partition again through the proposed partitioner, we can turn off the compression before and after the RelayTransform. If we do not divide the elements like this but just use IntactPartitioner like now and turn off the compression, the decompression phase in the output edge from the vertex having RelayTransform will not recognize the boundary of compression properly. (Many compression algorithms like LZ4 do not properly
      decompress the attached compressed bytes at once.)
      2) If we use the suggested partitioner, we can flush the output data to disk per every element.

      Attachments

        Issue Links

          Activity

            People

              sanha Sanha Lee
              sanha Sanha Lee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: