Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1855

Avoid scanning for previously written files within Inputs / Outputs

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      TezTaskOutput has a bunch of methods - getOutputFile, getOutputIndexFile, getSpillIndexFile - which are used within an Output to scan for files written earlier by the same Output. This should be avoided in favour of keeping track of previously written files.

        Attachments

        1. TEZ-1855.3.patch
          28 kB
          Rajesh Balamohan
        2. TEZ-1855.2.patch
          28 kB
          Rajesh Balamohan
        3. TEZ-1855.1.patch
          15 kB
          Rajesh Balamohan

          Activity

            People

            • Assignee:
              rajesh.balamohan Rajesh Balamohan
              Reporter:
              sseth Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: