Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5131

Provide better handling of job status related apis during JT restart

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      I've seen pig/hive applications bork during JT restart since they get NPEs - this is due to fact that jobs are not really inited, but are submitted.

      1. MAPREDUCE-5131.patch
        14 kB
        Arun C Murthy
      2. MAPREDUCE-5131.patch
        15 kB
        Arun C Murthy
      3. MAPREDUCE-5131.patch
        15 kB
        Arun C Murthy
      4. MAPREDUCE-5131.patch
        16 kB
        Arun C Murthy

        Activity

        Hide
        Arun C Murthy added a comment -

        Patch which helps deal more gracefully with job-related apis during JT restart.

        Show
        Arun C Murthy added a comment - Patch which helps deal more gracefully with job-related apis during JT restart.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The defaultPolicy is to retry on JobTrackerNotYetInitializedException. For SafeModeException, defaultPolicy is equivalent to TRY_ONCE_THEN_FAIL since SafeModeException will be thrown as a RemoteException. So remoteExceptionToPolicyMap.put(SafeModeException.class, defaultPolicy) is the same as remoteExceptionToPolicyMap.put(SafeModeException.class, TRY_ONCE_THEN_FAIL).

        I think it is simpler to change RetryUtils.getDefaultRetryPolicy to support multiple exceptions.

        Show
        Tsz Wo Nicholas Sze added a comment - The defaultPolicy is to retry on JobTrackerNotYetInitializedException. For SafeModeException, defaultPolicy is equivalent to TRY_ONCE_THEN_FAIL since SafeModeException will be thrown as a RemoteException. So remoteExceptionToPolicyMap.put(SafeModeException.class, defaultPolicy) is the same as remoteExceptionToPolicyMap.put(SafeModeException.class, TRY_ONCE_THEN_FAIL). I think it is simpler to change RetryUtils.getDefaultRetryPolicy to support multiple exceptions.
        Hide
        Arun C Murthy added a comment -

        Good point, thanks for taking a look Nic.

        Here is an update with your feedback incorporated.

        Show
        Arun C Murthy added a comment - Good point, thanks for taking a look Nic. Here is an update with your feedback incorporated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 the new patch looks good.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 the new patch looks good.
        Hide
        Karthik Kambatla added a comment -

        The second patch definitely looks good. Thanks Arun.

        Minor comments:

        1. JobTracker#setInitDone() shouldn't need to take any parameter, no? Currently, it is only the test using it, but this can change.
        2. JobTracker#waitForInit() shouldn't we do something when we catch the InterruptedException - at the least, logging it might help
        3. JobTracker#waitForInit() - adding a log message that we are about to wait should help
        4. RetryUtils#getDefaultRetryPolicy() - break after setting found?
        Show
        Karthik Kambatla added a comment - The second patch definitely looks good. Thanks Arun. Minor comments: JobTracker#setInitDone() shouldn't need to take any parameter, no? Currently, it is only the test using it, but this can change. JobTracker#waitForInit() shouldn't we do something when we catch the InterruptedException - at the least, logging it might help JobTracker#waitForInit() - adding a log message that we are about to wait should help RetryUtils#getDefaultRetryPolicy() - break after setting found?
        Hide
        Arun C Murthy added a comment -

        Thanks for taking a look Karthik Kambatla.

        I incorporated all your comments (good catch on the 'break', my bad) except the boolean param - I feel we can use it to toggle initDone in future; IAC, it's internal api.

        Show
        Arun C Murthy added a comment - Thanks for taking a look Karthik Kambatla . I incorporated all your comments (good catch on the 'break', my bad) except the boolean param - I feel we can use it to toggle initDone in future; IAC, it's internal api.
        Hide
        Karthik Kambatla added a comment -

        +1. Looks good to me. Thanks Arun.

        Show
        Karthik Kambatla added a comment - +1. Looks good to me. Thanks Arun.
        Hide
        Arun C Murthy added a comment -

        Trivial mods - fixed javadocs and a warning; added a few comments.

        Show
        Arun C Murthy added a comment - Trivial mods - fixed javadocs and a warning; added a few comments.
        Hide
        Arun C Murthy added a comment -

        Thanks for the reviews Tsz Wo Nicholas Sze and Karthik Kambatla. I just committed this.

        Show
        Arun C Murthy added a comment - Thanks for the reviews Tsz Wo Nicholas Sze and Karthik Kambatla . I just committed this.
        Hide
        Matt Foley added a comment -

        Closed upon release of Hadoop 1.2.0.

        Show
        Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development