Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-158

mapred.userlog.retain.hours killing long running tasks

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: tasktracker
    • Labels:
      None
    • Environment:

      0.19.2-dev, r753365

      Description

      One can reproduce the scenario by configuring mapred.userlog.retain.hours to 1hr, and running tasks that take more than an hour.

      More info on closed ticket HADOOP-5591.

        Issue Links

          Activity

          Billy Pearson created issue -
          Billy Pearson made changes -
          Field Original Value New Value
          Link This issue is related to HADOOP-5591 [ HADOOP-5591 ]
          Hide
          Ruyue Ma added a comment -

          This is related to mapred.userlog.retain.hours.

          Current, every task jvm tries to clean up user logs in hadoop/logs/userlogs dir. The standard is
          return file.lastModified() < purgeTimeStamp. This 'file' is the attempt dir. But the dir lastModified time doesn't change. so the change is
          + File indexFile = new File(file, "log.index");
          + if (indexFile.exists())

          { + return indexFile.lastModified() < purgeTimeStamp; + }

          else

          { + return file.lastModified() < purgeTimeStamp; + }

          Show
          Ruyue Ma added a comment - This is related to mapred.userlog.retain.hours. Current, every task jvm tries to clean up user logs in hadoop/logs/userlogs dir. The standard is return file.lastModified() < purgeTimeStamp. This 'file' is the attempt dir. But the dir lastModified time doesn't change. so the change is + File indexFile = new File(file, "log.index"); + if (indexFile.exists()) { + return indexFile.lastModified() < purgeTimeStamp; + } else { + return file.lastModified() < purgeTimeStamp; + }
          Ruyue Ma made changes -
          Attachment hadoop-5600.patch [ 12405220 ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
          Key HADOOP-5600 MAPREDUCE-158
          Affects Version/s 0.19.2 [ 12313650 ]
          Component/s mapred [ 12310690 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Correcting the summary and description.

          Show
          Vinod Kumar Vavilapalli added a comment - Correcting the summary and description.
          Vinod Kumar Vavilapalli made changes -
          Summary mapred.jobtracker.retirejob.interval killing long running reduce task mapred.userlog.retain.hours killing long running tasks
          Description Can verify by changing the mapred.jobtracker.retirejob.interval to < then your normal map time and watch the reduce task fail
          more info on closed ticket HADOOP-5591
          One can reproduce the scenario by configuring mapred.userlog.retain.hours to 1hr, and running tasks that take more than an hour.

          More info on closed ticket HADOOP-5591.
          Component/s tasktracker [ 12312906 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          This issue is circumvented after HADOOP-4374 and so is not visible beyond 0.20.

          The reason why logs of running tasks are no longer cleaned up causing failures is that HADOOP-4374 introduced log.tmp for atomicity of changes to log.index which is periodically created and written to by a running task. This results in a periodic change in modification time of attempt-log directory and prevents its cleanup even though mapred.userlog.retain.hours is over.

          So what should be done here? Close this issue? Or make the check for running tasks explicit during cleanup?

          Show
          Vinod Kumar Vavilapalli added a comment - This issue is circumvented after HADOOP-4374 and so is not visible beyond 0.20. The reason why logs of running tasks are no longer cleaned up causing failures is that HADOOP-4374 introduced log.tmp for atomicity of changes to log.index which is periodically created and written to by a running task. This results in a periodic change in modification time of attempt-log directory and prevents its cleanup even though mapred.userlog.retain.hours is over. So what should be done here? Close this issue? Or make the check for running tasks explicit during cleanup?
          Hide
          Amareshwari Sriramadasu added a comment -

          As per http://issues.apache.org/jira/browse/MAPREDUCE-927?focusedCommentId=12766412&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12766412,

          TaskTracker should delete the userlogs only after mapred.userlog.retain.hours after the job completion. Then it becomes a TaskTracker config parameter. And there will no permission issues for deletion.

          Show
          Amareshwari Sriramadasu added a comment - As per http://issues.apache.org/jira/browse/MAPREDUCE-927?focusedCommentId=12766412&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12766412 , TaskTracker should delete the userlogs only after mapred.userlog.retain.hours after the job completion. Then it becomes a TaskTracker config parameter. And there will no permission issues for deletion.
          Hide
          Amareshwari Sriramadasu added a comment -

          This issue doesn't exist any more, because MAPREDUCE-927 solves this by modifying "mapred.userlog.retain.hours" to specify the time(in hours) for which the user-logs are to be retained after the job completion.

          Show
          Amareshwari Sriramadasu added a comment - This issue doesn't exist any more, because MAPREDUCE-927 solves this by modifying "mapred.userlog.retain.hours" to specify the time(in hours) for which the user-logs are to be retained after the job completion.
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          351d 12h 39m 1 Amareshwari Sriramadasu 18/Mar/10 05:42

            People

            • Assignee:
              Unassigned
              Reporter:
              Billy Pearson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development