Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7328

Job.monitorAndPrintJob function can sleep most for 596 hours when jobclient.progress.monitor.poll.interval is misconfigured , causing the job to hang

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: client
    • Labels:
      None

      Description

      The loop terminates depending on a configurable value and there is little sanity checking on this value. When jobclient.progress.monitor.poll.interval is misconfigured to INT_MAX, it can cause the loop to sleep at most for 596 hours. The thread would get stuck and never report progress to the user even if the job moves forward. We suggest adding a cap value or a warning message.
       

       public boolean monitorAndPrintJob() 
            throws IOException, InterruptedException {
          ...
          while (!isComplete() || !reportedAfterCompletion) {
            if (isComplete()) {
              reportedAfterCompletion = true;
            } else {
              Thread.sleep(progMonitorPollIntervalMillis);
            }
          ...
      }
       

      Similar bug to MAPREDUCE-7327

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tshan Tina Shan
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: