Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1993

Implement a pluggable InputSizeEstimator for grouping fairly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.7.0
    • None
    • None

    Description

      Split grouping is currently done using a file size measurement which is the exact size of the split as it stays at rest on HDFS.

      This is not valid for columnar formats and especially suffers from highly compressible data skews.

      Attachments

        1. TEZ-1993.1.patch
          18 kB
          Gopal Vijayaraghavan
        2. TEZ-1993.2.patch
          18 kB
          Gopal Vijayaraghavan
        3. TEZ-1993.3.patch
          18 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              gopalv Gopal Vijayaraghavan
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: