Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2192

Relocalization does not check for source

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.6.0, 0.5.2
    • 0.5.4, 0.6.1
    • None
    • None

    Description

      PIG-4443 spills the input splits to disk if serialized split size is greater than some threshold. It faces issues with relocalization when more than one vertex has job.split file. If a job.split file is already there on container reuse, it is reused causing wrong data to be read.

      Either need a way to turn off relocalization or check the source+timestamp and redownload the file during relocalization.

      Attachments

        1. TEZ-2192.1.patch
          17 kB
          Hitesh Shah
        2. TEZ-2192.2.patch
          18 kB
          Hitesh Shah
        3. test-job-2192.patch
          7 kB
          Hitesh Shah
        4. TEZ-2192.3.patch
          18 kB
          Hitesh Shah
        5. TEZ-2192.3.patch
          27 kB
          Hitesh Shah
        6. TEZ-2192.4.patch
          27 kB
          Hitesh Shah
        7. TEZ-2192.5.patch
          27 kB
          Hitesh Shah

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hitesh Hitesh Shah
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment