Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2192

Relocalization does not check for source

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.6.0, 0.5.2
    • Fix Version/s: 0.5.4, 0.6.1
    • Component/s: None
    • Labels:
      None

      Description

      PIG-4443 spills the input splits to disk if serialized split size is greater than some threshold. It faces issues with relocalization when more than one vertex has job.split file. If a job.split file is already there on container reuse, it is reused causing wrong data to be read.

      Either need a way to turn off relocalization or check the source+timestamp and redownload the file during relocalization.

        Attachments

        1. TEZ-2192.1.patch
          17 kB
          Hitesh Shah
        2. TEZ-2192.2.patch
          18 kB
          Hitesh Shah
        3. test-job-2192.patch
          7 kB
          Hitesh Shah
        4. TEZ-2192.3.patch
          18 kB
          Hitesh Shah
        5. TEZ-2192.3.patch
          27 kB
          Hitesh Shah
        6. TEZ-2192.4.patch
          27 kB
          Hitesh Shah
        7. TEZ-2192.5.patch
          27 kB
          Hitesh Shah

          Issue Links

            Activity

              People

              • Assignee:
                hitesh Hitesh Shah
                Reporter:
                rohini Rohini Palaniswamy
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: