Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2192

Relocalization does not check for source

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.6.0, 0.5.2
    • 0.5.4, 0.6.1
    • None
    • None

    Description

      PIG-4443 spills the input splits to disk if serialized split size is greater than some threshold. It faces issues with relocalization when more than one vertex has job.split file. If a job.split file is already there on container reuse, it is reused causing wrong data to be read.

      Either need a way to turn off relocalization or check the source+timestamp and redownload the file during relocalization.

      Attachments

        1. TEZ-2192.5.patch
          27 kB
          Hitesh Shah
        2. TEZ-2192.4.patch
          27 kB
          Hitesh Shah
        3. TEZ-2192.3.patch
          27 kB
          Hitesh Shah
        4. TEZ-2192.3.patch
          18 kB
          Hitesh Shah
        5. test-job-2192.patch
          7 kB
          Hitesh Shah
        6. TEZ-2192.2.patch
          18 kB
          Hitesh Shah
        7. TEZ-2192.1.patch
          17 kB
          Hitesh Shah

        Issue Links

          Activity

            People

              hitesh Hitesh Shah
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: