Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4819

AM can rerun job after reporting final job status to the client

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.1-alpha
    • Fix Version/s: 3.0.0, 2.0.3-alpha, 0.23.6
    • Component/s: mr-am
    • Labels:
      None

      Description

      If the AM reports final job status to the client but then crashes before unregistering with the RM then the RM can run another AM attempt. Currently AM re-attempts assume that the previous attempts did not reach a final job state, and that causes the job to rerun (from scratch, if the output format doesn't support recovery).

      Re-running the job when we've already told the client the final status of the job is bad for a number of reasons. If the job failed, it's confusing at best since the client was already told the job failed but the subsequent attempt could succeed. If the job succeeded there could be data loss, as a subsequent job launched by the client tries to consume the job's output as input just as the re-attempt starts removing output files in preparation for the output commit.

      1. MAPREDUCE-4819.1.patch
        11 kB
        Bikas Saha
      2. MAPREDUCE-4819.2.patch
        35 kB
        Bikas Saha
      3. MAPREDUCE-4819.3.patch
        44 kB
        Bikas Saha
      4. MR-4819-4832.txt
        100 kB
        Robert Joseph Evans
      5. MR-4819-bobby-trunk.txt
        98 kB
        Robert Joseph Evans
      6. MR-4819-bobby-trunk.txt
        95 kB
        Robert Joseph Evans
      7. MR-4819-bobby-trunk.txt
        95 kB
        Robert Joseph Evans
      8. MR-4819-bobby-trunk.txt
        92 kB
        Robert Joseph Evans
      9. MR-4819-bobby-trunk.txt
        86 kB
        Robert Joseph Evans
      10. MR-4819-bobby-trunk.txt
        47 kB
        Robert Joseph Evans

        Issue Links

          Activity

          Hide
          Bikas Saha added a comment -

          If the AM talks to more than 1 entities about status (client and RM) then such races are possible. Maybe final client notification should be the last thing after all post processing is done. This way the client is the last to know and will never know about completion if things go wrong before that. Like NN not responding to client until edits have been written.
          In general it seems like we need to come up with a set of markers that previous AM's leave behind that can tell the next retry if the previous one failed/succeeded and so the current AM should exit or continue to run.

          Show
          Bikas Saha added a comment - If the AM talks to more than 1 entities about status (client and RM) then such races are possible. Maybe final client notification should be the last thing after all post processing is done. This way the client is the last to know and will never know about completion if things go wrong before that. Like NN not responding to client until edits have been written. In general it seems like we need to come up with a set of markers that previous AM's leave behind that can tell the next retry if the previous one failed/succeeded and so the current AM should exit or continue to run.
          Hide
          Jason Lowe added a comment -

          Maybe final client notification should be the last thing after all post processing is done.

          No, moving the client notification later just creates a different set of problems, like the client never being notified at all because the AM crashes after unregistering with the RM but before it notifies the client. The RM won't restart the app because it unregistered successfully, but the client is never notified.

          In general it seems like we need to come up with a set of markers that previous AM's leave behind that can tell the next retry if the previous one failed/succeeded and so the current AM should exit or continue to run.

          Exactly, and the AM is already doing this in the job history file which is how it helps supports recovery. We should extend this so that even if the output committer doesn't support recovery the AM will check for markers in the job history file and prevent the job from executing tasks and committing output if final job status has been determined by previous attempts. That way we prevent the AM from re-committing job output or changing the final job status after notifying the client. We just need to make sure those markers are flushed to persistent store and located properly by future AM attempts before attempting to notify the client. If subsequent attempts see the final job status marker then they should skip straight to the client notification process instead of running tasks.

          Show
          Jason Lowe added a comment - Maybe final client notification should be the last thing after all post processing is done. No, moving the client notification later just creates a different set of problems, like the client never being notified at all because the AM crashes after unregistering with the RM but before it notifies the client. The RM won't restart the app because it unregistered successfully, but the client is never notified. In general it seems like we need to come up with a set of markers that previous AM's leave behind that can tell the next retry if the previous one failed/succeeded and so the current AM should exit or continue to run. Exactly, and the AM is already doing this in the job history file which is how it helps supports recovery. We should extend this so that even if the output committer doesn't support recovery the AM will check for markers in the job history file and prevent the job from executing tasks and committing output if final job status has been determined by previous attempts. That way we prevent the AM from re-committing job output or changing the final job status after notifying the client. We just need to make sure those markers are flushed to persistent store and located properly by future AM attempts before attempting to notify the client. If subsequent attempts see the final job status marker then they should skip straight to the client notification process instead of running tasks.
          Hide
          Koji Noguchi added a comment -

          like the client never being notified at all because the AM crashes after unregistering with the RM but before it notifies the client.

          As long as client eventually fail, that's not a problem.

          Critical problem we have here is false-positive from the client's perspective.
          Client is getting 'success' but output is incomplete or corrupt(due to retried application/job (over)writing to the same target path.)

          If we can have AM and RM to agree on the job status before telling the client, I think that would work. There could be a corner case when AM and RM say the job was successful but client thinks it failed. This false-negative is much better than false-positive issue we have now. Even in 0.20, we had cases when JobTracker reports job was successful but client thinks it failed due to communication failure to the JobTracker. This is fine to happen and we should let the client handle the recovery-or-retry.

          In general it seems like we need to come up with a set of markers that previous AM's leave behind

          I don't want the correctness of the job to depend on the marker on hdfs.

          Show
          Koji Noguchi added a comment - like the client never being notified at all because the AM crashes after unregistering with the RM but before it notifies the client. As long as client eventually fail, that's not a problem. Critical problem we have here is false-positive from the client's perspective. Client is getting 'success' but output is incomplete or corrupt(due to retried application/job (over)writing to the same target path.) If we can have AM and RM to agree on the job status before telling the client, I think that would work. There could be a corner case when AM and RM say the job was successful but client thinks it failed. This false-negative is much better than false-positive issue we have now. Even in 0.20, we had cases when JobTracker reports job was successful but client thinks it failed due to communication failure to the JobTracker. This is fine to happen and we should let the client handle the recovery-or-retry. In general it seems like we need to come up with a set of markers that previous AM's leave behind I don't want the correctness of the job to depend on the marker on hdfs.
          Hide
          Koji Noguchi added a comment -

          I don't want the correctness of the job to depend on the marker on hdfs.

          I meant, hdfs on user space like outputpath. If this is stored elsewhere where user cannot access, I have no problem.

          Show
          Koji Noguchi added a comment - I don't want the correctness of the job to depend on the marker on hdfs. I meant, hdfs on user space like outputpath. If this is stored elsewhere where user cannot access, I have no problem.
          Hide
          Robert Joseph Evans added a comment -

          We are informing several different actors of "success/failure" in many different ways.

          1. _SUCCESS file being written to HDFS by the output committer as part of commitJob()
          2. job end notification by hitting an http server
          3. client being informed through RPC
          4. history server being informed by placing the log in a directory it can see
          5. resource manager being informed that the application is done

          Some of these are much more important to report then others, but either way we still have at a minimum two different things that need to be tied together the commitJob and informing the RM not to run us again. Rearranging the order of them will not fix the fact that after commitJob() finishes there is the possibility that something will fail but must not fail the job. We really need to have a two phase commit in the job history file.

          I am about to commit the job output.
          commitJob()
          I finished committing the job output successfully.

          Without this there will always be the possibility that commitJob will be called twice, which would result in changes to the output directory. I would argue too that some of these are important enough that we consider reporting them twice and updating the listener to handle double reporting. Like informing the history server about the job finishing. Others it is not so critical, like job end notification or client RPC.

          Koji,

          I get that we want to reduce the risk of a user shooting themselves in the foot, but the file must be stored in a user accessible location because the entire job is run as the user. It is stored under the .staging directory which if the user deletes will cause many other problems already and probably cause the job to fail. We can try to set it up so that if the previous job history file does not exist on any app attempt but the first one we fail fast. That would prevent retries from messing up the output directory.

          Show
          Robert Joseph Evans added a comment - We are informing several different actors of "success/failure" in many different ways. _SUCCESS file being written to HDFS by the output committer as part of commitJob() job end notification by hitting an http server client being informed through RPC history server being informed by placing the log in a directory it can see resource manager being informed that the application is done Some of these are much more important to report then others, but either way we still have at a minimum two different things that need to be tied together the commitJob and informing the RM not to run us again. Rearranging the order of them will not fix the fact that after commitJob() finishes there is the possibility that something will fail but must not fail the job. We really need to have a two phase commit in the job history file. I am about to commit the job output. commitJob() I finished committing the job output successfully. Without this there will always be the possibility that commitJob will be called twice, which would result in changes to the output directory. I would argue too that some of these are important enough that we consider reporting them twice and updating the listener to handle double reporting. Like informing the history server about the job finishing. Others it is not so critical, like job end notification or client RPC. Koji, I get that we want to reduce the risk of a user shooting themselves in the foot, but the file must be stored in a user accessible location because the entire job is run as the user. It is stored under the .staging directory which if the user deletes will cause many other problems already and probably cause the job to fail. We can try to set it up so that if the previous job history file does not exist on any app attempt but the first one we fail fast. That would prevent retries from messing up the output directory.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Haven't read through the whole discussion yet, but it looks to me that the following will solve the issue: What we need to ensure is the final JobHistoryEvent i.e JobFinishedEvent is logged and flushed before changing the job-state to SUCCEEDED. JobHistory is our commit log. In case RM reruns an application, we need to verify if there is a final JobFinishedEvent and avoid rerunning in case there is one.

          Show
          Vinod Kumar Vavilapalli added a comment - Haven't read through the whole discussion yet, but it looks to me that the following will solve the issue: What we need to ensure is the final JobHistoryEvent i.e JobFinishedEvent is logged and flushed before changing the job-state to SUCCEEDED. JobHistory is our commit log. In case RM reruns an application, we need to verify if there is a final JobFinishedEvent and avoid rerunning in case there is one.
          Hide
          Jason Lowe added a comment -

          We have to be careful about the fact that the job history log is moved to the done intermediate directory during shutdown after notifying the client. Therefore there's a window of opportunity where we can fail after notifying the client and moving the job history file but before unregistering from the RM. When the app attempt restarts in that case, the job history file won't be found and we'll end up re-running the job from scratch. We either need to unregister from the RM first (and rely on the FINISHING grace period to buy us enough time to move the file) or explicitly not delete the file when we copy it to done intermediate and instead wait for the staging directory to be removed later to clean it up.

          Show
          Jason Lowe added a comment - We have to be careful about the fact that the job history log is moved to the done intermediate directory during shutdown after notifying the client. Therefore there's a window of opportunity where we can fail after notifying the client and moving the job history file but before unregistering from the RM. When the app attempt restarts in that case, the job history file won't be found and we'll end up re-running the job from scratch. We either need to unregister from the RM first (and rely on the FINISHING grace period to buy us enough time to move the file) or explicitly not delete the file when we copy it to done intermediate and instead wait for the staging directory to be removed later to clean it up.
          Hide
          Robert Joseph Evans added a comment -

          My vote would be to leave it around until we are done done and staging is removed. It seems simpler.

          Show
          Robert Joseph Evans added a comment - My vote would be to leave it around until we are done done and staging is removed. It seems simpler.
          Hide
          Jason Lowe added a comment -

          My vote would be to leave it around until we are done done and staging is removed. It seems simpler.

          Agreed, although we would also need to make sure we only delete the staging directory after unregistering from the RM. Something we need to do anyway, see YARN-244.

          Show
          Jason Lowe added a comment - My vote would be to leave it around until we are done done and staging is removed. It seems simpler. Agreed, although we would also need to make sure we only delete the staging directory after unregistering from the RM. Something we need to do anyway, see YARN-244 .
          Hide
          Robert Joseph Evans added a comment -

          Yes, but going off of Koji's comments we also want to be sure that if the previous attempts edit log does not exist we don't know what state we were in and we should just assume we need to unregister and exit.

          Show
          Robert Joseph Evans added a comment - Yes, but going off of Koji's comments we also want to be sure that if the previous attempts edit log does not exist we don't know what state we were in and we should just assume we need to unregister and exit.
          Hide
          Bikas Saha added a comment -

          Attaching a patch based on discussions with Vinod and implementing what is in his comment above. I was testing it by making the AM die during MRAppMaster.shutdownJob() after successful job completion but the second attempt could not find the history file during recoveryService.parse()

          File does not exist: /tmp/hadoop-yarn/staging/bikas/.staging/job_1354125268052_0001_1.jhist

          the job history log is moved to the done intermediate dir

          Can this explain why I am seeing the above error? Any pointers?

          Show
          Bikas Saha added a comment - Attaching a patch based on discussions with Vinod and implementing what is in his comment above. I was testing it by making the AM die during MRAppMaster.shutdownJob() after successful job completion but the second attempt could not find the history file during recoveryService.parse() File does not exist: /tmp/hadoop-yarn/staging/bikas/.staging/job_1354125268052_0001_1.jhist the job history log is moved to the done intermediate dir Can this explain why I am seeing the above error? Any pointers?
          Hide
          Jason Lowe added a comment -

          See JobHistoryEventHandler.closeEventWriter and moveToDoneNow. That's what's moving the job history file from the staging directory to the done intermediate directory so the history server picks it up. We need to not delete the file after we move it.

          Show
          Jason Lowe added a comment - See JobHistoryEventHandler.closeEventWriter and moveToDoneNow. That's what's moving the job history file from the staging directory to the done intermediate directory so the history server picks it up. We need to not delete the file after we move it.
          Hide
          Bikas Saha added a comment -

          Yeah. Got the same info from Vinod in an offline conversation.

          Looks like the patch solves half the problem. Making sure that history is fully saved before changing to succeeded state.
          The other half is to make sure the recovery data is available to the restarted app.
          Since the RM can restart FAILED/KILLED/SUCCEEDED apps, looks like we will need to wait for state data to be saved for all of them and not just succeeded state (which is what the patch does). Or else, the RM could restart a failed app which would run to again and fail again.

          The solutions to the second half could be
          1) dont delete the original in staging dirs. But this suffers from a problem that final staging dir clean up would end up cleaning it for a successful app and then AM could crash
          2) have recovery service look at both temp and done locations. But this suffers from race conditions when the AM does a partial move to done dir and then dies. so part of the data is on temp and part in done.
          3) before moving from temp to done create a marker file in done. upon restart, check if marker file exists. if it does then dont do anything because the job was done (failed/killed/successful) and it died sometime after that.

          Show
          Bikas Saha added a comment - Yeah. Got the same info from Vinod in an offline conversation. Looks like the patch solves half the problem. Making sure that history is fully saved before changing to succeeded state. The other half is to make sure the recovery data is available to the restarted app. Since the RM can restart FAILED/KILLED/SUCCEEDED apps, looks like we will need to wait for state data to be saved for all of them and not just succeeded state (which is what the patch does). Or else, the RM could restart a failed app which would run to again and fail again. The solutions to the second half could be 1) dont delete the original in staging dirs. But this suffers from a problem that final staging dir clean up would end up cleaning it for a successful app and then AM could crash 2) have recovery service look at both temp and done locations. But this suffers from race conditions when the AM does a partial move to done dir and then dies. so part of the data is on temp and part in done. 3) before moving from temp to done create a marker file in done. upon restart, check if marker file exists. if it does then dont do anything because the job was done (failed/killed/successful) and it died sometime after that.
          Hide
          Jason Lowe added a comment -

          We can't have the AM looking for the file in done_intermediate. The history server could have moved it out of there in the interim. And I don't think we want the AM to "know" how to find it's file in the final done location the history server puts it in either. Too much coupling between those systems, IMHO.

          I think leaving it in the staging directory is the correct solution. As I mentioned, we need to make sure we don't delete the staging directory before unregistering with the RM. That prevents subsequent AM re-attempts right off the bat. And deleting the staging directory before unregistering is happening today as discussed in YARN-244, so that problem is not specific to this fix.

          Leaving it in staging is straightforward. No need for extra markers, racing with the history server, etc. And if the staging directory is gone, well the AM can't relaunch in the first place, so no issues of re-running and re-committing there. We could still have a discrepancy between the client thinking the job succeeded (which it basically did re: its output data) but the RM saying it failed, but this is fixable by moving the removal of the staging directory to after we unregister from the RM when we fix YARN-244.

          Show
          Jason Lowe added a comment - We can't have the AM looking for the file in done_intermediate. The history server could have moved it out of there in the interim. And I don't think we want the AM to "know" how to find it's file in the final done location the history server puts it in either. Too much coupling between those systems, IMHO. I think leaving it in the staging directory is the correct solution. As I mentioned, we need to make sure we don't delete the staging directory before unregistering with the RM. That prevents subsequent AM re-attempts right off the bat. And deleting the staging directory before unregistering is happening today as discussed in YARN-244 , so that problem is not specific to this fix. Leaving it in staging is straightforward. No need for extra markers, racing with the history server, etc. And if the staging directory is gone, well the AM can't relaunch in the first place, so no issues of re-running and re-committing there. We could still have a discrepancy between the client thinking the job succeeded (which it basically did re: its output data) but the RM saying it failed, but this is fixable by moving the removal of the staging directory to after we unregister from the RM when we fix YARN-244 .
          Hide
          Jason Lowe added a comment -

          Took a look at the patch, and I think we are missing some critical corner cases. For example, if we finish committing the job and the committer is using a marker of sorts (e.g.: _SUCCESS), then we could trigger downstream jobs to run before the job history is completely closed. I believe Oozie is polling for the _SUCCESS marker, for example. If we crash after committing but before writing the job finished record then we could end up re-committing again while another job is attempting to consume our output, leading to potential data loss even though both jobs would have "SUCCEEDED". That's a Bad Thing.

          I think the crux of the issue is that we must not commit twice. The act of committing is what could trigger downstream jobs or in itself not be repeatable/recoverable, so we should treat AM crashes during job commit much like we treat non-crashing failures during job commit today, i.e.: it should fail the job without re-running and re-committing. Worst-case we have a false negative where the output did commit successfully but we thought the job failed, and I agree with Koji that a false negative beats a false positive in this case.

          This means we need a marker noting when we start and stop committing sync'd to the job history file. If the AM relaunches and finds we crashed during commit, we should treat it as we do a committer failure and fail the job. If the re-attempt finds we finished committing then we simply need to unregister from the RM without re-running.

          Show
          Jason Lowe added a comment - Took a look at the patch, and I think we are missing some critical corner cases. For example, if we finish committing the job and the committer is using a marker of sorts (e.g.: _SUCCESS), then we could trigger downstream jobs to run before the job history is completely closed. I believe Oozie is polling for the _SUCCESS marker, for example. If we crash after committing but before writing the job finished record then we could end up re-committing again while another job is attempting to consume our output, leading to potential data loss even though both jobs would have "SUCCEEDED". That's a Bad Thing. I think the crux of the issue is that we must not commit twice. The act of committing is what could trigger downstream jobs or in itself not be repeatable/recoverable, so we should treat AM crashes during job commit much like we treat non-crashing failures during job commit today, i.e.: it should fail the job without re-running and re-committing. Worst-case we have a false negative where the output did commit successfully but we thought the job failed, and I agree with Koji that a false negative beats a false positive in this case. This means we need a marker noting when we start and stop committing sync'd to the job history file. If the AM relaunches and finds we crashed during commit, we should treat it as we do a committer failure and fail the job. If the re-attempt finds we finished committing then we simply need to unregister from the RM without re-running.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          This has ordering issues with MAPREDUCE-4813.

          Show
          Vinod Kumar Vavilapalli added a comment - This has ordering issues with MAPREDUCE-4813 .
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Just read up the whole wall of comments. This has become quite messy, we didn't foresee some of this during design, sigh.

          +1 to the proposal of leaving job-history files behind till the staging directory cleanup.

          Jason't last comment about _SUCCESS files is very valid. In fact there is a third problem, one related to the JobEndNotification. Essentially, there are the three ways client learn about final status - RPC call, _SUCCESS markers and JobEndNotification. We need to make sure we handle consistency with all the three. But let's split the tasks and let this one take care of the usual RPC calls. Let's debate the solution for other two issues separately? I propose this to keep things saner, it has become quite unweildly already.

          Show
          Vinod Kumar Vavilapalli added a comment - Just read up the whole wall of comments. This has become quite messy, we didn't foresee some of this during design, sigh . +1 to the proposal of leaving job-history files behind till the staging directory cleanup. Jason't last comment about _SUCCESS files is very valid. In fact there is a third problem, one related to the JobEndNotification. Essentially, there are the three ways client learn about final status - RPC call, _SUCCESS markers and JobEndNotification. We need to make sure we handle consistency with all the three. But let's split the tasks and let this one take care of the usual RPC calls. Let's debate the solution for other two issues separately? I propose this to keep things saner, it has become quite unweildly already.
          Hide
          Jason Lowe added a comment -

          Again to me, it's all about the commit. If we address that then I don't think the others are all that critical since the commit occurs first and is crucial to not repeat. The others should be safe to repeat if necessary.

          Once we checkpoint the fact that we committed, the rest can be recovered in a relatively straightforward manner on subsequent attempts with the existing code – we just skip past the commit and proceed doing what we're already doing, setting final job status, performing job end notification, unregistering, etc. Job end notification is already a best-effort-but-not-guaranteed service, and we can't avoid the potential for double-notifications.

          If we think delaying reporting job success via job status RPC call until after the history file is copied to done_intermediate is important (which I don't see being so since the commit can still be repeated) then we can do that in another JIRA or in this one. However this one would still need to be fixed and is a very high priority.

          Show
          Jason Lowe added a comment - Again to me, it's all about the commit. If we address that then I don't think the others are all that critical since the commit occurs first and is crucial to not repeat. The others should be safe to repeat if necessary. Once we checkpoint the fact that we committed, the rest can be recovered in a relatively straightforward manner on subsequent attempts with the existing code – we just skip past the commit and proceed doing what we're already doing, setting final job status, performing job end notification, unregistering, etc. Job end notification is already a best-effort-but-not-guaranteed service, and we can't avoid the potential for double-notifications. If we think delaying reporting job success via job status RPC call until after the history file is copied to done_intermediate is important (which I don't see being so since the commit can still be repeated) then we can do that in another JIRA or in this one. However this one would still need to be fixed and is a very high priority.
          Hide
          Bikas Saha added a comment -

          Attaching a patch based on the above suggestions of keeping the temp data around. The temp data was actually being stored in the global staging dir and not the job-specific staging dir. So it wasnt being deleted upon successful completion. I changed it so that its stored inside the job staging directory and so all temp history will go away after the last successful job/last retry.
          Added a test and also verified manually by hacking a System.exit() in MRAppMaster.shutdownJob() that the following works. An AM dies after reporting finished state but before unregistering. It is restarted and the new AM exits with success after registering and unregistering with the RM.

          As far as YARN-244 is concerned the comments around the code seem to suggest that it was an explicit decision to cleanup before unregistering.

              // Add the staging directory cleaner before the history server but after
              // the container allocator so the staging directory is cleaned after
              // the history has been flushed but before unregistering with the RM.
              addService(createStagingDirCleaningService());
          

          This patch addresses the issue for this jira - make sure a successfully completed job does not rerun the job is the AM is retried. Pending a solution to YARN-244. But its safe because once staging dir is cleaned up the next attempt cannot run. So its a fail stop.

          Show
          Bikas Saha added a comment - Attaching a patch based on the above suggestions of keeping the temp data around. The temp data was actually being stored in the global staging dir and not the job-specific staging dir. So it wasnt being deleted upon successful completion. I changed it so that its stored inside the job staging directory and so all temp history will go away after the last successful job/last retry. Added a test and also verified manually by hacking a System.exit() in MRAppMaster.shutdownJob() that the following works. An AM dies after reporting finished state but before unregistering. It is restarted and the new AM exits with success after registering and unregistering with the RM. As far as YARN-244 is concerned the comments around the code seem to suggest that it was an explicit decision to cleanup before unregistering. // Add the staging directory cleaner before the history server but after // the container allocator so the staging directory is cleaned after // the history has been flushed but before unregistering with the RM. addService(createStagingDirCleaningService()); This patch addresses the issue for this jira - make sure a successfully completed job does not rerun the job is the AM is retried. Pending a solution to YARN-244 . But its safe because once staging dir is cleaned up the next attempt cannot run. So its a fail stop.
          Hide
          Bikas Saha added a comment -

          I am not quite clear why the commit would be repeated if the job does not execute any task at all?
          As far as I understand from the comments (I havent looked at the code), the commit code seems to be user pluggable code. In that case, how can we ensure that every commit implementation can be made into a singleton operation? Can it be as simple as a committer refusing to commit if the output file already exists? Are committers allowed to delete an output file if it exists? In that case how does it differentiate between a checkpointed commit from a previous crashed run vs an old commit from a successful job?
          On a side note, we should be encouraging projects that depend on output markers for job completion polling, to stop doing that and start using API's. Perhaps in the next version change. Continuing to support these kind of use cases could make solutions more complex and fragile than they need to be.

          Show
          Bikas Saha added a comment - I am not quite clear why the commit would be repeated if the job does not execute any task at all? As far as I understand from the comments (I havent looked at the code), the commit code seems to be user pluggable code. In that case, how can we ensure that every commit implementation can be made into a singleton operation? Can it be as simple as a committer refusing to commit if the output file already exists? Are committers allowed to delete an output file if it exists? In that case how does it differentiate between a checkpointed commit from a previous crashed run vs an old commit from a successful job? On a side note, we should be encouraging projects that depend on output markers for job completion polling, to stop doing that and start using API's. Perhaps in the next version change. Continuing to support these kind of use cases could make solutions more complex and fragile than they need to be.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555369/MAPREDUCE-4819.2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs:

          org.apache.hadoop.mapreduce.v2.app.TestFail
          org.apache.hadoop.mapreduce.v2.app.TestKill
          org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM
          org.apache.hadoop.mapreduce.v2.app.TestMRClientService
          org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
          org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
          org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
          org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryEvents

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3081//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3081//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555369/MAPREDUCE-4819.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs: org.apache.hadoop.mapreduce.v2.app.TestFail org.apache.hadoop.mapreduce.v2.app.TestKill org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM org.apache.hadoop.mapreduce.v2.app.TestMRClientService org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryEvents +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3081//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3081//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          As far as YARN-244 is concerned the comments around the code seem to suggest that it was an explicit decision to cleanup before unregistering.

          Yes, it was explicitly done as a workaround to ensure the staging directory was cleaned up before the RM shot the AM. Previously there was a race where the AM was trying to delete the staging directory while the RM was shooting the AM after it unregistered and often the AM lost and the staging directory was left around. Since then we've added a FINISHING state to allow the AM to cleanup before the RM tries to shoot it. Given that, we should move the staging directory cleanup back to after we unregister (but not in this JIRA).

          I am not quite clear why the commit would be repeated if the job does not execute any task at all?

          The job will only avoid committing if it sees the job completion event written to the history file, but that occurs after committing. Therefore if we commit then crash before we sync that completion event to disk, the second attempt will try to commit again. And we're seeing a number of cases where the AM crashed after committing but before completing job history. It should be relatively rare, but it can happen and is happening.

          the commit code seems to be user pluggable code. In that case, how can we ensure that every commit implementation can be made into a singleton operation? Can it be as simple as a committer refusing to commit if the output file already exists? Are committers allowed to delete an output file if it exists? In that case how does it differentiate between a checkpointed commit from a previous crashed run vs an old commit from a successful job?

          The committer is user-pluggable code and therefore can do arbitrary things. It doesn't have to be files. It can be a database commit, an web service transaction, a custom job-end notification mechanism, or whatever. Therefore we cannot assume the commit is recoverable – there are reasons why the job fails when the committer says it failed, because we can't retry it. In the future maybe we can extend the committer API to allow the committer to say it can attempt to recover from a job commit failure, but for now we can't tell. That's why re-running a commit is Not Good.

          On a side note, we should be encouraging projects that depend on output markers for job completion polling, to stop doing that and start using API's. Perhaps in the next version change. Continuing to support these kind of use cases could make solutions more complex and fragile than they need to be.

          The file marker thing is just one committer's way of handling things. Committers can do arbitrary things. The job doesn't even have to produce output as files, for example. It's pluggable for a reason, and we can't know or assume what it's doing. We can only give it interfaces and restrictions (hopefully as few as possible) to govern how it interoperates with the rest of the job framework.

          Show
          Jason Lowe added a comment - As far as YARN-244 is concerned the comments around the code seem to suggest that it was an explicit decision to cleanup before unregistering. Yes, it was explicitly done as a workaround to ensure the staging directory was cleaned up before the RM shot the AM. Previously there was a race where the AM was trying to delete the staging directory while the RM was shooting the AM after it unregistered and often the AM lost and the staging directory was left around. Since then we've added a FINISHING state to allow the AM to cleanup before the RM tries to shoot it. Given that, we should move the staging directory cleanup back to after we unregister (but not in this JIRA). I am not quite clear why the commit would be repeated if the job does not execute any task at all? The job will only avoid committing if it sees the job completion event written to the history file, but that occurs after committing. Therefore if we commit then crash before we sync that completion event to disk, the second attempt will try to commit again. And we're seeing a number of cases where the AM crashed after committing but before completing job history. It should be relatively rare, but it can happen and is happening. the commit code seems to be user pluggable code. In that case, how can we ensure that every commit implementation can be made into a singleton operation? Can it be as simple as a committer refusing to commit if the output file already exists? Are committers allowed to delete an output file if it exists? In that case how does it differentiate between a checkpointed commit from a previous crashed run vs an old commit from a successful job? The committer is user-pluggable code and therefore can do arbitrary things. It doesn't have to be files. It can be a database commit, an web service transaction, a custom job-end notification mechanism, or whatever. Therefore we cannot assume the commit is recoverable – there are reasons why the job fails when the committer says it failed, because we can't retry it. In the future maybe we can extend the committer API to allow the committer to say it can attempt to recover from a job commit failure, but for now we can't tell. That's why re-running a commit is Not Good. On a side note, we should be encouraging projects that depend on output markers for job completion polling, to stop doing that and start using API's. Perhaps in the next version change. Continuing to support these kind of use cases could make solutions more complex and fragile than they need to be. The file marker thing is just one committer's way of handling things. Committers can do arbitrary things. The job doesn't even have to produce output as files, for example. It's pluggable for a reason, and we can't know or assume what it's doing. We can only give it interfaces and restrictions (hopefully as few as possible) to govern how it interoperates with the rest of the job framework.
          Hide
          Bikas Saha added a comment -

          Attaching patch with Hadoop QA test failures fixed.

          Show
          Bikas Saha added a comment - Attaching patch with Hadoop QA test failures fixed.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555379/MAPREDUCE-4819.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3082//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3082//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555379/MAPREDUCE-4819.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 13 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3082//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3082//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          Bikas,

          I would actually like to propose an alternative fix. I am attaching a very preliminary patch. This will instead put a "lock" around the job commit by adding a few new files into the staging directory. Task commits would be required to handle the rare possibility of a double commit, just as it is possible in 1.0 now. We would make it just as likely to happen as it is in 1.0 by also putting in MAPREDUCE-4832 which would help to ensure that we don't have two AM telling tasks to do things at the same time.

          I would appreciate any feedback on this approach. I am going to be working to add in more tests and clean up the code.

          Show
          Robert Joseph Evans added a comment - Bikas, I would actually like to propose an alternative fix. I am attaching a very preliminary patch. This will instead put a "lock" around the job commit by adding a few new files into the staging directory. Task commits would be required to handle the rare possibility of a double commit, just as it is possible in 1.0 now. We would make it just as likely to happen as it is in 1.0 by also putting in MAPREDUCE-4832 which would help to ensure that we don't have two AM telling tasks to do things at the same time. I would appreciate any feedback on this approach. I am going to be working to add in more tests and clean up the code.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562909/MR-4819-bobby-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562909/MR-4819-bobby-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3184//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          The findbugs warning is because the code is not complete. The javac warning is because of a new EventHandler not having the generics on it. Both of these are currently expected.

          Show
          Robert Joseph Evans added a comment - The findbugs warning is because the code is not complete. The javac warning is because of a new EventHandler not having the generics on it. Both of these are currently expected.
          Hide
          Bikas Saha added a comment -

          It would really help if you could elaborate on the solution a bit more. I think I get the gist (ie try to lock the commit using atomic file operations) but I am not clear beyond that part. We can quickly discuss the utility of both approaches after that. Perhaps you have already done that in your mind
          The only thing I would like to guard against is linking of job commit operation with job completion where they can be independent. I agree that job commit is strictly needed before job completion. But making job commit the same as job completion may not be correct. eg. other operations post completion that are unsafe to repeat (maybe none exist now) or committing multiple outputs perhaps.
          The patch posted earlier, made sure that if a job has completed then it will be a no-op to run it again. Its a safe change. Also, it notifies the client about job success after making sure that the success state is persisted. I agree is does not handle errors in commit which is perhaps what your patch is addressing.
          So it could be that both changes are needed.

          Show
          Bikas Saha added a comment - It would really help if you could elaborate on the solution a bit more. I think I get the gist (ie try to lock the commit using atomic file operations) but I am not clear beyond that part. We can quickly discuss the utility of both approaches after that. Perhaps you have already done that in your mind The only thing I would like to guard against is linking of job commit operation with job completion where they can be independent. I agree that job commit is strictly needed before job completion. But making job commit the same as job completion may not be correct. eg. other operations post completion that are unsafe to repeat (maybe none exist now) or committing multiple outputs perhaps. The patch posted earlier, made sure that if a job has completed then it will be a no-op to run it again. Its a safe change. Also, it notifies the client about job success after making sure that the success state is persisted. I agree is does not handle errors in commit which is perhaps what your patch is addressing. So it could be that both changes are needed.
          Hide
          Robert Joseph Evans added a comment -

          Sorry, Yes I have been working very closely with Jason Lowe lately on this and MAPREDUCE-4832, so I glossed over a lot more then I should have.

          In general this patch is more formally coupling job commit to job completion because it was informally coupled previously. FileOutputCommitter optionally will mark a directory as complete with an "_SUCCESS" file when the job is committed. Oozie or other workflow systems can use this to recognize that a job has finished and start processing that output as input to another job. If we do not couple them there is a race that Oozie may lose. You are correct that we have to be careful about what processing happens after a job is committed and verify that it can be redone without any problem. The things that happen here are moving the job history over to where the history server can pick it up, job end notification, unregistering from the RM, and cleaning up the staging directory.

          Looking at each of these one at a time:
          For moving job history over I do need to adopt the change that you made to make it more robust where we copy the log file and do not delete the old one until the staging directory is removed. I also need to make changes to the HistoryServer to allow it to ignore the subsequent JobHistory files for the same job.

          For Job End notification. This is hitting a URL to indicate that the job has finished and if it has finished successfully or in error. I do need to do some integration tests with Oozie to validate that it can handle being informed more then once without having any real problems. The notification is a best effort contract, so in the short term I plan to disable notification if we think that we may double notify (Commit finished and we don't know if we notified or not). I know Oozie can handle this, but it will delay some processing. We can then explore changing that contract on a separate JIRA.

          Unregistering with the RM is by its very nature atomic. If we crash after unregistering we will not be rerun.

          Deleting the staging directory is also guarded against (code commented out in the first patch, but I have fixed the unit tests in and will have it in an upcoming patch). If for some reason the staging directory was removed and a new AM is launched it will exit with an error.

          The only other code that is part of this patch is the JobHistoryCopyService. This is kind of a stripped down version of the recovery service for the special case where we are not going to rerun anything, we just want the events to be put into the new history file. We could have copied the old history file over, but it would be missing the section about this new AM.

          This first patch was just to show the concepts. There is still a fair amount of work to do before it is really ready to commit, so if you have any other suggestions, or potential problems that you see with this approach please point them out.

          Show
          Robert Joseph Evans added a comment - Sorry, Yes I have been working very closely with Jason Lowe lately on this and MAPREDUCE-4832 , so I glossed over a lot more then I should have. In general this patch is more formally coupling job commit to job completion because it was informally coupled previously. FileOutputCommitter optionally will mark a directory as complete with an "_SUCCESS" file when the job is committed. Oozie or other workflow systems can use this to recognize that a job has finished and start processing that output as input to another job. If we do not couple them there is a race that Oozie may lose. You are correct that we have to be careful about what processing happens after a job is committed and verify that it can be redone without any problem. The things that happen here are moving the job history over to where the history server can pick it up, job end notification, unregistering from the RM, and cleaning up the staging directory. Looking at each of these one at a time: For moving job history over I do need to adopt the change that you made to make it more robust where we copy the log file and do not delete the old one until the staging directory is removed. I also need to make changes to the HistoryServer to allow it to ignore the subsequent JobHistory files for the same job. For Job End notification. This is hitting a URL to indicate that the job has finished and if it has finished successfully or in error. I do need to do some integration tests with Oozie to validate that it can handle being informed more then once without having any real problems. The notification is a best effort contract, so in the short term I plan to disable notification if we think that we may double notify (Commit finished and we don't know if we notified or not). I know Oozie can handle this, but it will delay some processing. We can then explore changing that contract on a separate JIRA. Unregistering with the RM is by its very nature atomic. If we crash after unregistering we will not be rerun. Deleting the staging directory is also guarded against (code commented out in the first patch, but I have fixed the unit tests in and will have it in an upcoming patch). If for some reason the staging directory was removed and a new AM is launched it will exit with an error. The only other code that is part of this patch is the JobHistoryCopyService. This is kind of a stripped down version of the recovery service for the special case where we are not going to rerun anything, we just want the events to be put into the new history file. We could have copied the old history file over, but it would be missing the section about this new AM. This first patch was just to show the concepts. There is still a fair amount of work to do before it is really ready to commit, so if you have any other suggestions, or potential problems that you see with this approach please point them out.
          Hide
          Robert Joseph Evans added a comment -

          This is an updated version of my patch. It addresses all of the outstanding tasks besides integration with the split brain fix MAPREDUCE-4832. I still need to do a lot of manual testing to be sure that this fixes the issues. But I think it is very close to being a final patch. Please take a look at it.

          Bikas, if you have concerns about it or think that there is more from your patch that I need to pull in please let me know.

          Show
          Robert Joseph Evans added a comment - This is an updated version of my patch. It addresses all of the outstanding tasks besides integration with the split brain fix MAPREDUCE-4832 . I still need to do a lot of manual testing to be sure that this fixes the issues. But I think it is very close to being a final patch. Please take a look at it. Bikas, if you have concerns about it or think that there is more from your patch that I need to pull in please let me know.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562963/MR-4819-bobby-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

          org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562963/MR-4819-bobby-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3186//console This message is automatically generated.
          Hide
          Bikas Saha added a comment -

          Looks like after the recent changes in JobImpl and the current alternative approach my original fix for not rerunning the job does not really apply. I think you would want to take the changes in my patch that adds the jobid to the history staging dir. Since the staging dir is not deleted during job history flushing, I had observed that if I made my AM crash (by putting an exit(1) in shutdownJob() then the history files would get orphaned and not cleaned up. Or something like that. And to fix that I had to add the jobid to the path.
          Snippet from my patch.

          +++ hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java
          @@ -186,10 +186,11 @@ public static PathFilter getHistoryFileFilter() {
              * @return A string representation of the prefix.
              */
             public static String
          -      getConfiguredHistoryStagingDirPrefix(Configuration conf)
          +      getConfiguredHistoryStagingDirPrefix(Configuration conf, String jobId)
                     throws IOException {
               String user = UserGroupInformation.getCurrentUser().getShortUserName();
          -    Path path = MRApps.getStagingAreaDir(conf, user);
          +    Path stagingPath = MRApps.getStagingAreaDir(conf, user);
          +    Path path = new Path(stagingPath, jobId);
               String logDir = path.toString();
               return logDir;
             }
          

          For the patch itself I have a few comments

          Why not end in success if the staging dir was cleaned up by the last attempt? I am guessing that this code wont be necessary after we move the unregister to RM before the staging dir cleanup in MAPREDUCE-4841, right?

          +      if(!stagingExists) {
          +        copyHistory = false;
          +        isLastAMRetry = true;
          +        justShutDown = true;
          +        shouldNotify = false;
          +        forcedState = JobStateInternal.ERROR;
          +        shutDownMessage = "Staging dir does not exist " + stagingDir;
          +        LOG.fatal(shutDownMessage);
          

          Why are we only eating/ignoring the JobEvents in the dispatcher? So that the JobImpl state machine is not triggered?

          This might be a question of personal preference. I think an explicit transition to from the INIT to final state is cleaner than overriding the state in the getter.

             public JobStateInternal getInternalState() {
               readLock.lock();
               try {
          +      if(forcedState != null) {
          +        return forcedState;
          +      }
          

          Didnt quite get this in HistoryFileManager.java. Looks like it related to a recent change in that code.

          +      } else if (old != null && !old.isMovePending()) {
          +        //This is a duplicate so just delete it
          +        fileInfo.delete();
                 }
          

          Typo

          +        throw new Exception("No handler for regitered for " + type);
          +      }
          
          Show
          Bikas Saha added a comment - Looks like after the recent changes in JobImpl and the current alternative approach my original fix for not rerunning the job does not really apply. I think you would want to take the changes in my patch that adds the jobid to the history staging dir. Since the staging dir is not deleted during job history flushing, I had observed that if I made my AM crash (by putting an exit(1) in shutdownJob() then the history files would get orphaned and not cleaned up. Or something like that. And to fix that I had to add the jobid to the path. Snippet from my patch. +++ hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java @@ -186,10 +186,11 @@ public static PathFilter getHistoryFileFilter() { * @ return A string representation of the prefix. */ public static String - getConfiguredHistoryStagingDirPrefix(Configuration conf) + getConfiguredHistoryStagingDirPrefix(Configuration conf, String jobId) throws IOException { String user = UserGroupInformation.getCurrentUser().getShortUserName(); - Path path = MRApps.getStagingAreaDir(conf, user); + Path stagingPath = MRApps.getStagingAreaDir(conf, user); + Path path = new Path(stagingPath, jobId); String logDir = path.toString(); return logDir; } For the patch itself I have a few comments Why not end in success if the staging dir was cleaned up by the last attempt? I am guessing that this code wont be necessary after we move the unregister to RM before the staging dir cleanup in MAPREDUCE-4841 , right? + if (!stagingExists) { + copyHistory = false ; + isLastAMRetry = true ; + justShutDown = true ; + shouldNotify = false ; + forcedState = JobStateInternal.ERROR; + shutDownMessage = "Staging dir does not exist " + stagingDir; + LOG.fatal(shutDownMessage); Why are we only eating/ignoring the JobEvents in the dispatcher? So that the JobImpl state machine is not triggered? This might be a question of personal preference. I think an explicit transition to from the INIT to final state is cleaner than overriding the state in the getter. public JobStateInternal getInternalState() { readLock.lock(); try { + if (forcedState != null ) { + return forcedState; + } Didnt quite get this in HistoryFileManager.java. Looks like it related to a recent change in that code. + } else if (old != null && !old.isMovePending()) { + //This is a duplicate so just delete it + fileInfo.delete(); } Typo + throw new Exception( "No handler for regitered for " + type); + }
          Hide
          Bikas Saha added a comment -

          In general, you might want to rename some of the new stuff like "justShutDown" or "EventEater". And I feel that the change in MRAppMaster.init() function might benefit with some refactoring.

          Show
          Bikas Saha added a comment - In general, you might want to rename some of the new stuff like "justShutDown" or "EventEater". And I feel that the change in MRAppMaster.init() function might benefit with some refactoring.
          Hide
          Alejandro Abdelnur added a comment -

          For Job End notification. This is hitting a URL to indicate that the job has finished and if it has finished successfully or in error. I do need to do some integration tests with Oozie to validate that it can handle being informed more then once without having any real problems.

          Oozie handles duplicate notifications correctly doing a NOP.

          Show
          Alejandro Abdelnur added a comment - For Job End notification. This is hitting a URL to indicate that the job has finished and if it has finished successfully or in error. I do need to do some integration tests with Oozie to validate that it can handle being informed more then once without having any real problems. Oozie handles duplicate notifications correctly doing a NOP.
          Hide
          Siddharth Seth added a comment -

          Bobby, Jason, Along with trying to ensure that a commit does not happen twice, I think there is value in committing the job history file before changing job status to SUCCESS - primarily for the RPC to behave consistently. It can otherwise see temporary final states, if the AM crashes during the history file persist, and won't be able to retrieve counters or other job status till the next AM attempt. This does have the drawback of a small performance hit though - and also makes job history a critical part of a job.
          Using separate files for marking success / failure - am guessing this is to have a smaller change of a failing persist, as compared to persisting events via the HistoryFile, which may already have a backlog of events ?

          Wondering if it's possible to achieve the same checks via the CommitterEventHandler instead of checking in the MRAppMaster class. i.e follow the regular recovery path - except the CommitHandler emits success / failed / abort events depending on the presence of these files / (history events).
          Alternately, the current implementation could be simplified by using a custom RMCommunicator - which does not depend on JobImpl. i.e. the history copier and an RMCommunicator to unregister from the RM.

          Comments on the current patch

          • If the last AM attempt were to crash - data exists since the SUCCESS file exists, RPC will not see SUCCESS.
          • While the new AM is running - it will not be able to handle status, counter etc requests. This seems a little problematic if a success has been reported over RPC from the previous AM. Since this AM is dealing with the history file - could possibly have it return information from the history file ?
            History commit before SUCCESS may help with the previous 2 points.
          • If the recovered AppMaster is not the last retry - looks like the RM unregistration will not happen. (isLastAMRetry)
          • Is a KILLED status also required - KILLED during commit should not be reported as FAILED
          • The check for commitSuccess / commitFailure in the AM - the failure check can happen before the success check (low chance but a success file could be created followed by an RPC failure)
          • CommitEventHandler.touchz could throw an exception if the file already exists - to prevent lost AMs from committing. (maybe not required after MAPREDUCE-4832 ?)
          • historyService creation - can move into the common if (copyHistory) check
          • Don't think "AMStartedEvent" cannot be ignored - the history server will have no info about past AMs. I think only the current AM needs to be ignored.

          Wondering if it's possible to use HDFS dirs and timestamps to co-ordinate between an active AM and lost AMs.
          Also, are hdfs dir operations cheaper than file create operations (NN only / NN +DN) ? Nor sure if mkdir / 0 length file creation are NN only ops.

          Show
          Siddharth Seth added a comment - Bobby, Jason, Along with trying to ensure that a commit does not happen twice, I think there is value in committing the job history file before changing job status to SUCCESS - primarily for the RPC to behave consistently. It can otherwise see temporary final states, if the AM crashes during the history file persist, and won't be able to retrieve counters or other job status till the next AM attempt. This does have the drawback of a small performance hit though - and also makes job history a critical part of a job. Using separate files for marking success / failure - am guessing this is to have a smaller change of a failing persist, as compared to persisting events via the HistoryFile, which may already have a backlog of events ? Wondering if it's possible to achieve the same checks via the CommitterEventHandler instead of checking in the MRAppMaster class. i.e follow the regular recovery path - except the CommitHandler emits success / failed / abort events depending on the presence of these files / (history events). Alternately, the current implementation could be simplified by using a custom RMCommunicator - which does not depend on JobImpl. i.e. the history copier and an RMCommunicator to unregister from the RM. Comments on the current patch If the last AM attempt were to crash - data exists since the SUCCESS file exists, RPC will not see SUCCESS. While the new AM is running - it will not be able to handle status, counter etc requests. This seems a little problematic if a success has been reported over RPC from the previous AM. Since this AM is dealing with the history file - could possibly have it return information from the history file ? History commit before SUCCESS may help with the previous 2 points. If the recovered AppMaster is not the last retry - looks like the RM unregistration will not happen. (isLastAMRetry) Is a KILLED status also required - KILLED during commit should not be reported as FAILED The check for commitSuccess / commitFailure in the AM - the failure check can happen before the success check (low chance but a success file could be created followed by an RPC failure) CommitEventHandler.touchz could throw an exception if the file already exists - to prevent lost AMs from committing. (maybe not required after MAPREDUCE-4832 ?) historyService creation - can move into the common if (copyHistory) check Don't think "AMStartedEvent" cannot be ignored - the history server will have no info about past AMs. I think only the current AM needs to be ignored. Wondering if it's possible to use HDFS dirs and timestamps to co-ordinate between an active AM and lost AMs. Also, are hdfs dir operations cheaper than file create operations (NN only / NN +DN) ? Nor sure if mkdir / 0 length file creation are NN only ops.
          Hide
          Robert Joseph Evans added a comment -

          Wow lots of comments. Thanks for everyone looking at the patch.

          I had observed that if I made my AM crash (by putting an exit(1) in shutdownJob() then the history files would get orphaned and not cleaned up. Or something like that.

          Thanks for the heads up. I will look into that.

          Why not end in success if the staging dir was cleaned up by the last attempt?

          Because we crashed somewhere after staging was cleaned up and before we unregistered. Crashing seems like an error to me, but I suppose we could change it. As for what the client ultimately sees for success or failure, we will rely on the history server to report that.

          I am guessing that this code wont be necessary after we move the unregister to RM before the staging dir cleanup in MAPREDUCE-4841, right?

          Yes and No. Once MAPREDUCE-4841 goes in there is an increased possibility of leaking staging directories. I have seen users in 1.0 blow away their staging directory to clean up, and caused jobs to fail. Granted they are more likely to get errors from the distributed cache not finding the files it needs, but in either case I would like to be paranoid and guard against that.

          Why are we only eating/ignoring the JobEvents in the dispatcher? So that the JobImpl state machine is not triggered?

          In the new code path we have not wired up everything. JobImpl is created but the JobEventDispatcher is not. I did not want to have to deal with recovering the complete state of the job. Which in some cases may not even be possible. This is also why I am not brining up the RPC server. Which now that you mention it I probably also need to update the UI/client to deal with that appropriately. The typo you found was just there for debugging this situation. (I'll fix the typo by the way)

          This might be a question of personal preference. I think an explicit transition to from the INIT to final state is cleaner than overriding the state in the getter.

          I actually wanted to put in a stubbed out Job instead, but there are too many places that Job is cast to JobImpl just to get the state making it difficult to do so. I will look again to see if I can split the two apart, or add in a state transition.

          Oozie handles duplicate notifications correctly doing a NOP.

          Great. I will look at the javadocs for job end notification again to be sure that we can default to notify instead.

          Using separate files for marking success / failure - am guessing this is to have a smaller change of a failing persist, as compared to persisting events via the HistoryFile, which may already have a backlog of events?

          It was also a much smaller change to make. The HistoryFile would be preferable if we wanted to guarantee at most once commit of the tasks, because there are so many of them.

          Wondering if it's possible to achieve the same checks via the CommitterEventHandler instead of checking in the MRAppMaster class. i.e follow the regular recovery path - except the CommitHandler emits success / failed / abort events depending on the presence of these files / (history events).

          Alternately, the current implementation could be simplified by using a custom RMCommunicator - which does not depend on JobImpl. i.e. the history copier and an RMCommunicator to unregister from the RM.

          Both of those seem like valid things to investigate. I feel like I am close on this and want to get this working as is first and then I will look at the other approaches you suggested. I do like the first one as it seems like it would be a lot simpler to implement, but I want a backup that I know functions before making drastic changes to the design.

          If the last AM attempt were to crash - data exists since the SUCCESS file exists, RPC will not see SUCCESS.

          We have a lot of problems in general if the last AM were to crash. It is possible that the history server would have no knowledge of the job what so ever even if it finished successfully. This patch is not attempting to address those problems.

          While the new AM is running - it will not be able to handle status, counter etc requests. This seems a little problematic if a success has been reported over RPC from the previous AM. Since this AM is dealing with the history file - could possibly have it return information from the history file ? History commit before SUCCESS may help with the previous 2 points.

          Yes History commit before returning success would help with those problems. I will look into it as an alternative approach. my initial thought was to update the client/UI to wait for the AM to report a valid address so that no client is trying to get counters etc from an AM in this situation.

          If the recovered AppMaster is not the last retry - looks like the RM unregistration will not happen. (isLastAMRetry)

          isLastAMRetry is set in a number of places, including in the init method if we notice that the previous Job ended but the AM crashed.

          Is a KILLED status also required - KILLED during commit should not be reported as FAILED.

          That would be nice. We would have to put it in as part of CommiterEventHandler.cancelJobCommit(). I will look into that.

          CommitEventHandler.touchz could throw an exception if the file already exists - to prevent lost AMs from committing. (maybe not required after MAPREDUCE-4832 ?)

          I think it already will. We are not opening the file for append, we are trying to create it.

          historyService creation - can move into the common if (copyHistory) check

          OK

          Don't think "AMStartedEvent" cannot be ignored - the history server will have no info about past AMs. I think only the current AM needs to be ignored.

          The AMStartedEvent is ignored by the copy service but not by the MRAppMaster. The MRAppMaster will read the history file just like it did before and extract the AMStartedEvents, it will add in another one for itself, and then the copyHistoryService will read the rest of the history file.

          Wondering if it's possible to use HDFS dirs and timestamps to co-ordinate between an active AM and lost AMs.

          Also, are hdfs dir operations cheaper than file create operations (NN only / NN +DN) ? Nor sure if mkdir / 0 length file creation are NN only ops.

          I thought that they were NN only ops, but I will check with an HDFS person to know for sure.

          Show
          Robert Joseph Evans added a comment - Wow lots of comments. Thanks for everyone looking at the patch. I had observed that if I made my AM crash (by putting an exit(1) in shutdownJob() then the history files would get orphaned and not cleaned up. Or something like that. Thanks for the heads up. I will look into that. Why not end in success if the staging dir was cleaned up by the last attempt? Because we crashed somewhere after staging was cleaned up and before we unregistered. Crashing seems like an error to me, but I suppose we could change it. As for what the client ultimately sees for success or failure, we will rely on the history server to report that. I am guessing that this code wont be necessary after we move the unregister to RM before the staging dir cleanup in MAPREDUCE-4841 , right? Yes and No. Once MAPREDUCE-4841 goes in there is an increased possibility of leaking staging directories. I have seen users in 1.0 blow away their staging directory to clean up, and caused jobs to fail. Granted they are more likely to get errors from the distributed cache not finding the files it needs, but in either case I would like to be paranoid and guard against that. Why are we only eating/ignoring the JobEvents in the dispatcher? So that the JobImpl state machine is not triggered? In the new code path we have not wired up everything. JobImpl is created but the JobEventDispatcher is not. I did not want to have to deal with recovering the complete state of the job. Which in some cases may not even be possible. This is also why I am not brining up the RPC server. Which now that you mention it I probably also need to update the UI/client to deal with that appropriately. The typo you found was just there for debugging this situation. (I'll fix the typo by the way) This might be a question of personal preference. I think an explicit transition to from the INIT to final state is cleaner than overriding the state in the getter. I actually wanted to put in a stubbed out Job instead, but there are too many places that Job is cast to JobImpl just to get the state making it difficult to do so. I will look again to see if I can split the two apart, or add in a state transition. Oozie handles duplicate notifications correctly doing a NOP. Great. I will look at the javadocs for job end notification again to be sure that we can default to notify instead. Using separate files for marking success / failure - am guessing this is to have a smaller change of a failing persist, as compared to persisting events via the HistoryFile, which may already have a backlog of events? It was also a much smaller change to make. The HistoryFile would be preferable if we wanted to guarantee at most once commit of the tasks, because there are so many of them. Wondering if it's possible to achieve the same checks via the CommitterEventHandler instead of checking in the MRAppMaster class. i.e follow the regular recovery path - except the CommitHandler emits success / failed / abort events depending on the presence of these files / (history events). Alternately, the current implementation could be simplified by using a custom RMCommunicator - which does not depend on JobImpl. i.e. the history copier and an RMCommunicator to unregister from the RM. Both of those seem like valid things to investigate. I feel like I am close on this and want to get this working as is first and then I will look at the other approaches you suggested. I do like the first one as it seems like it would be a lot simpler to implement, but I want a backup that I know functions before making drastic changes to the design. If the last AM attempt were to crash - data exists since the SUCCESS file exists, RPC will not see SUCCESS. We have a lot of problems in general if the last AM were to crash. It is possible that the history server would have no knowledge of the job what so ever even if it finished successfully. This patch is not attempting to address those problems. While the new AM is running - it will not be able to handle status, counter etc requests. This seems a little problematic if a success has been reported over RPC from the previous AM. Since this AM is dealing with the history file - could possibly have it return information from the history file ? History commit before SUCCESS may help with the previous 2 points. Yes History commit before returning success would help with those problems. I will look into it as an alternative approach. my initial thought was to update the client/UI to wait for the AM to report a valid address so that no client is trying to get counters etc from an AM in this situation. If the recovered AppMaster is not the last retry - looks like the RM unregistration will not happen. (isLastAMRetry) isLastAMRetry is set in a number of places, including in the init method if we notice that the previous Job ended but the AM crashed. Is a KILLED status also required - KILLED during commit should not be reported as FAILED. That would be nice. We would have to put it in as part of CommiterEventHandler.cancelJobCommit(). I will look into that. CommitEventHandler.touchz could throw an exception if the file already exists - to prevent lost AMs from committing. (maybe not required after MAPREDUCE-4832 ?) I think it already will. We are not opening the file for append, we are trying to create it. historyService creation - can move into the common if (copyHistory) check OK Don't think "AMStartedEvent" cannot be ignored - the history server will have no info about past AMs. I think only the current AM needs to be ignored. The AMStartedEvent is ignored by the copy service but not by the MRAppMaster. The MRAppMaster will read the history file just like it did before and extract the AMStartedEvents, it will add in another one for itself, and then the copyHistoryService will read the rest of the history file. Wondering if it's possible to use HDFS dirs and timestamps to co-ordinate between an active AM and lost AMs. Also, are hdfs dir operations cheaper than file create operations (NN only / NN +DN) ? Nor sure if mkdir / 0 length file creation are NN only ops. I thought that they were NN only ops, but I will check with an HDFS person to know for sure.
          Hide
          Robert Joseph Evans added a comment -

          This patch should be fully functional.

          I have included the work by Bikas to put the Job history file in a location that is deleted with the staging directory. I have fixed a few bugs in the original where we were not registering with the RM correctly. And also where the Web App Proxy would return a 500 error if hit when recovery was happening.

          I have manually tested this by having the AM exit/halt before, during, and after job commit. I tested it with the job commit failing and succeeding. Everything appears to be working as expected.

          I did not change JobImpl forcedState because adding in the transitions was more then I wanted to do right now. I am happy to file a follow up JIRA to make those changes if we want them.

          I have also not added in the kill state. Again it looked a bit tricky because of the multithreading and I would prefer to get something working in now and add that as part of a follow up JIRA.

          I talked with Kihwal Lee about the extra HDFS load for an empty file vs a directory and he said about the only extra load is the extra PRC call to close it, and because it is just two files per job I left it as is. If you feel strongly about it I can fix it on a separate JIRA.

          About the only thing that is left for this is integration with MAPREDUCE-4832.

          Show
          Robert Joseph Evans added a comment - This patch should be fully functional. I have included the work by Bikas to put the Job history file in a location that is deleted with the staging directory. I have fixed a few bugs in the original where we were not registering with the RM correctly. And also where the Web App Proxy would return a 500 error if hit when recovery was happening. I have manually tested this by having the AM exit/halt before, during, and after job commit. I tested it with the job commit failing and succeeding. Everything appears to be working as expected. I did not change JobImpl forcedState because adding in the transitions was more then I wanted to do right now. I am happy to file a follow up JIRA to make those changes if we want them. I have also not added in the kill state. Again it looked a bit tricky because of the multithreading and I would prefer to get something working in now and add that as part of a follow up JIRA. I talked with Kihwal Lee about the extra HDFS load for an empty file vs a directory and he said about the only extra load is the extra PRC call to close it, and because it is just two files per job I left it as is. If you feel strongly about it I can fix it on a separate JIRA. About the only thing that is left for this is integration with MAPREDUCE-4832 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563151/MR-4819-bobby-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

          org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler
          org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563151/MR-4819-bobby-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.mapreduce.v2.app.commit.TestCommitterEventHandler org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3189//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          Fixes Findbugs issue, and test failures. Both were test issues I had missed previously.

          Show
          Robert Joseph Evans added a comment - Fixes Findbugs issue, and test failures. Both were test issues I had missed previously.
          Hide
          Robert Joseph Evans added a comment -

          With the latest comments on MAPREDUCE-4832 I removed the place holder in here for code from it. Now this should be able to stand alone, and be committed if deemed acceptable.

          Show
          Robert Joseph Evans added a comment - With the latest comments on MAPREDUCE-4832 I removed the place holder in here for code from it. Now this should be able to stand alone, and be committed if deemed acceptable.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563172/MR-4819-bobby-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

          org.apache.hadoop.mapreduce.v2.app.TestRecovery
          org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs
          org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesTasks
          org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs
          org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesTasks
          org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3190//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3190//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3190//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563172/MR-4819-bobby-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.mapreduce.v2.app.TestRecovery org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesTasks org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesTasks org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3190//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3190//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3190//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          I am investigating the test failures. I think they are unrelated to this patch, because they work just fine for me when I run them without up-merging to the latest trunk.

          Show
          Robert Joseph Evans added a comment - I am investigating the test failures. I think they are unrelated to this patch, because they work just fine for me when I run them without up-merging to the latest trunk.
          Hide
          Robert Joseph Evans added a comment -

          For some reason all of the web service tests were failing with out of memory errors, that I have not been able to reproduce yet myself. The TestRecovery failures I also have not been able to reproduce, but I did not see any OOMs there.

          Show
          Robert Joseph Evans added a comment - For some reason all of the web service tests were failing with out of memory errors, that I have not been able to reproduce yet myself. The TestRecovery failures I also have not been able to reproduce, but I did not see any OOMs there.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563176/MR-4819-bobby-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3191//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3191//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3191//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563176/MR-4819-bobby-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3191//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3191//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3191//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          OK looking at it all of the failures appear to be associated with the hadoop4 machine. I will work with tgraves to see if we can figure out what is happening.

          Show
          Robert Joseph Evans added a comment - OK looking at it all of the failures appear to be associated with the hadoop4 machine. I will work with tgraves to see if we can figure out what is happening.
          Hide
          Bikas Saha added a comment -

          When staging dir exists but commitStarted marker does not exist, then it means that its a retry that should continue as normal, right?
          If yes, shouldnt copyHistory be set to false for the above case? Looks like copyHistory should be set to true only inside the following block. Only when commit started, do we need to copy history and end. In other cases, we should not copy history. Changes initial value of copyHistory to false and set it when needed?

          +      } else if (commitStarted) {
          

          Typos "errorHappendShutDown" "NoopEventHanlder"

          If we change this code to create new file or fail then AM knows when it has lost its race to commit. Does this provide a simpler fix for MAPREDUCE-4832? When AM tries to initiate commit, then only the first one manages to write the commit_start file in HDFS. So racing AM's will fail after the first one succeeds. The marker still exists for the purpose of signalling start of commit (ie for this jira). It should not matter which AM commits the result because the computation is deterministic. The AM that failed to commit could check/wait for end of commit marker in order to makes sure that the last retry succeeds (if that is necessary).

          +    private void touchz(Path p) throws IOException {
          +      fs.create(p).close();
          +    }
          
          Show
          Bikas Saha added a comment - When staging dir exists but commitStarted marker does not exist, then it means that its a retry that should continue as normal, right? If yes, shouldnt copyHistory be set to false for the above case? Looks like copyHistory should be set to true only inside the following block. Only when commit started, do we need to copy history and end. In other cases, we should not copy history. Changes initial value of copyHistory to false and set it when needed? + } else if (commitStarted) { Typos "errorHappendShutDown" "NoopEventHanlder" If we change this code to create new file or fail then AM knows when it has lost its race to commit. Does this provide a simpler fix for MAPREDUCE-4832 ? When AM tries to initiate commit, then only the first one manages to write the commit_start file in HDFS. So racing AM's will fail after the first one succeeds. The marker still exists for the purpose of signalling start of commit (ie for this jira). It should not matter which AM commits the result because the computation is deterministic. The AM that failed to commit could check/wait for end of commit marker in order to makes sure that the last retry succeeds (if that is necessary). + private void touchz(Path p) throws IOException { + fs.create(p).close(); + }
          Hide
          Jason Lowe added a comment -

          If we change this code to create new file or fail then AM knows when it has lost its race to commit. Does this provide a simpler fix for MAPREDUCE-4832?

          If an app attempt sees the file, how does it even know whether there's an active race that was lost? The other AM could have simply crashed mid-commit. The losing AM could just assume that's the case and unregister from the RM with a FAILED status assuming job commit failed. (Or maybe wait for some configurable timeout "just in case".)

          However this would only cover job commit, and two racing app attempts could still commit output for tasks simultaneously. MAPREDUCE-4832 prevents two racing app attempts from committing the same task output, as at most one will be "active" and allowed to commit. That could be bad if the old attempt is re-committing output for a fetch-failure map task while the second attempt is trying to recover, for example. Task output could be lost in that case.

          Show
          Jason Lowe added a comment - If we change this code to create new file or fail then AM knows when it has lost its race to commit. Does this provide a simpler fix for MAPREDUCE-4832 ? If an app attempt sees the file, how does it even know whether there's an active race that was lost? The other AM could have simply crashed mid-commit. The losing AM could just assume that's the case and unregister from the RM with a FAILED status assuming job commit failed. (Or maybe wait for some configurable timeout "just in case".) However this would only cover job commit, and two racing app attempts could still commit output for tasks simultaneously. MAPREDUCE-4832 prevents two racing app attempts from committing the same task output, as at most one will be "active" and allowed to commit. That could be bad if the old attempt is re-committing output for a fetch-failure map task while the second attempt is trying to recover, for example. Task output could be lost in that case.
          Hide
          Siddharth Seth added a comment -

          I think it already will. We are not opening the file for append, we are trying to create it.

          fs.create(Path) - overwrites by default, instead of throwing an exception. There's another form which does not overwrite. Don't think this is a problem once 4832 goes in.

          I have also not added in the kill state. Again it looked a bit tricky because of the multithreading and I would prefer to get something working in now and add that as part of a follow up JIRA.

          ok. This seems like it will be easier if we rely on the history file as the commit log instead of the 3/more individual files.

          RPC clients not being able to communicate with the AM / history (or getting alternate states) after having seen a SUCCESS state seems to be independent of this patch. Separate jira.

          This seems ok for now since it's gotten some attention and has been tried out. I think handling all of this via the CommitHandler is a cleaner approach, and we can move to that at a later point.

          Show
          Siddharth Seth added a comment - I think it already will. We are not opening the file for append, we are trying to create it. fs.create(Path) - overwrites by default, instead of throwing an exception. There's another form which does not overwrite. Don't think this is a problem once 4832 goes in. I have also not added in the kill state. Again it looked a bit tricky because of the multithreading and I would prefer to get something working in now and add that as part of a follow up JIRA. ok. This seems like it will be easier if we rely on the history file as the commit log instead of the 3/more individual files. RPC clients not being able to communicate with the AM / history (or getting alternate states) after having seen a SUCCESS state seems to be independent of this patch. Separate jira. This seems ok for now since it's gotten some attention and has been tried out. I think handling all of this via the CommitHandler is a cleaner approach, and we can move to that at a later point.
          Hide
          Bikas Saha added a comment -

          Sid, how about creating some jiras so that your ideas dont get lost as comments.

          Show
          Bikas Saha added a comment - Sid, how about creating some jiras so that your ideas dont get lost as comments.
          Hide
          Robert Joseph Evans added a comment -

          This version addresses the final typos and makes it so the file creation throws an error. I had to also add in some staging directory cleanup to another test to make it also pass. I will file JIRAs for the remaining issues. Sid please let me know if I missed any issues in the JIRAs

          Show
          Robert Joseph Evans added a comment - This version addresses the final typos and makes it so the file creation throws an error. I had to also add in some staging directory cleanup to another test to make it also pass. I will file JIRAs for the remaining issues. Sid please let me know if I missed any issues in the JIRAs
          Hide
          Robert Joseph Evans added a comment -

          I filed MAPREDUCE-4912 to investigate how to clean up the code.

          Show
          Robert Joseph Evans added a comment - I filed MAPREDUCE-4912 to investigate how to clean up the code.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563295/MR-4819-bobby-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3195//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3195//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3195//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563295/MR-4819-bobby-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3195//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3195//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3195//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          Now that MAPREDUCE-4832 is in. I have upmerged this patch to deal with it.

          Show
          Robert Joseph Evans added a comment - Now that MAPREDUCE-4832 is in. I have upmerged this patch to deal with it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563347/MR-4819-4832.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          -1 javac. The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3199//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3199//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3199//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563347/MR-4819-4832.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. -1 javac . The applied patch generated 2015 javac compiler warnings (more than the trunk's current 2014 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3199//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3199//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3199//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          Thanks Bikas and Sid for all of the help on this. I put this into trunk, branch-2, and branch-0.23

          Show
          Robert Joseph Evans added a comment - Thanks Bikas and Sid for all of the help on this. I put this into trunk, branch-2, and branch-0.23
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3179 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3179/)
          MAPREDUCE-4819. AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3179 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3179/ ) MAPREDUCE-4819 . AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #87 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/87/)
          MAPREDUCE-4819. AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #87 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/87/ ) MAPREDUCE-4819 . AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #485 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/485/)
          svn merge -c 1429114 FIXES: MAPREDUCE-4819. AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429120)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429120
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #485 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/485/ ) svn merge -c 1429114 FIXES: MAPREDUCE-4819 . AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429120) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429120 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1276 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1276/)
          MAPREDUCE-4819. AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1276 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1276/ ) MAPREDUCE-4819 . AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1306 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1306/)
          MAPREDUCE-4819. AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1306 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1306/ ) MAPREDUCE-4819 . AM can rerun job after reporting final job status to the client (bobby and Bikas Saha via bobby) (Revision 1429114) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429114 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRApp.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JobHistoryUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/HistoryEventHandler.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java

            People

            • Assignee:
              Bikas Saha
              Reporter:
              Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development