Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1228

Prototype IFile : Define a memory & merge optimized vertex-intermediate file format for Tez

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • None
    • Reviewed

    Description

      The current vertex-intermediate format used all across Tez is a flat file of variable length k,v pairs. For a significant number of use-cases, in particular the sorted output phase, a large number of consecutive identical keys are found within the same stream. The IFile format ends up writing each key out fully into the stream to generate (K,V) pairs instead of ordering it into a more efficient K,

      {V1, .. Vn}

      list.

      This duplication of key data needs larger buffers to hold in memory and requires comparison between keys known to be identical while doing a merge sort.

      This bug tracks the building of a prototype IFile format which is optimized for lower uncompressed sizes within memory buffers and less compute intensive to perform merge sorts during the reducer phase.

      Attachments

        1. TEZ-1228-IFile.pdf
          167 kB
          Rajesh Balamohan
        2. TEZ-1228.WIP.2.patch
          44 kB
          Rajesh Balamohan
        3. TEZ-1228.WIP.1.patch
          33 kB
          Rajesh Balamohan
        4. TEZ-1228.5.patch
          71 kB
          Rajesh Balamohan
        5. TEZ-1228.4.patch
          71 kB
          Rajesh Balamohan
        6. TEZ-1228.3.patch
          69 kB
          Rajesh Balamohan
        7. TEZ-1228.2.patch
          54 kB
          Rajesh Balamohan
        8. TEZ-1228.1.patch
          50 kB
          Rajesh Balamohan

        Issue Links

          Activity

            People

              rajesh.balamohan Rajesh Balamohan
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: