Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1288

Create FastTezSerialization as an optional feature

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0
    • 0.5.0
    • None
    • None
    • Reviewed

    Description

      Tez inherits the writable framework from map-reduce.

      This is very flexible, but not particularly memory efficient for the small data types.

      When deserializing, each value and key has to be allocated afresh for each small chunk of data (new IntWritable instead of .set()).

      The bytes writable serialization operation always has to write a 4 byte prefix for all values and keys, because of requirements around streamed .readFields() instead of a customer setter/getter impl.

      Implement a faster serialization mechanism for the inner loop of sort, spill, merge, which doesn't trigger the GC and avoids adding simplistic overheads to the IFile format.

      Attachments

        1. TEZ-1288.1.patch
          23 kB
          Rajesh Balamohan
        2. TEZ-1288.2.patch
          38 kB
          Rajesh Balamohan
        3. TEZ-1288.3.patch
          36 kB
          Rajesh Balamohan
        4. TEZ-1288.4.patch
          59 kB
          Rajesh Balamohan
        5. TEZ-1288.4.1.patch
          59 kB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: