Hadoop Common
  1. Hadoop Common
  2. HADOOP-1191

MapTask should wait for the status reporting thread to die before invoking the TaskUmbilicalProtocol.done(taskid)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.2
    • Fix Version/s: 0.12.3
    • Component/s: None
    • Labels:
      None

      Description

      Currently, the status reporting thread is sent an interrupt and immediately after that TaskUmbilicalProtocol.done() is invoked. A better thing to do is to wait for the thread to die before invoking done() otherwise it is possible that a status message just makes it through and then the Task is put in RUNNING state at the TaskTracker. This results in inconsistency about the runstate of a task at the TaskTracker's end.

      1. 1191.patch
        0.4 kB
        Devaraj Das
      2. 1191-2.patch
        3 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -
          Show
          Hadoop QA added a comment - Integrated in Hadoop-Nightly #46 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/46/ )
          Hide
          Doug Cutting added a comment -

          I committed this.

          Show
          Doug Cutting added a comment - I committed this.
          Hide
          Hadoop QA added a comment -

          -1, because the patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12354785/1191-2.patch as a patch to trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

          Show
          Hadoop QA added a comment - -1, because the patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12354785/1191-2.patch as a patch to trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
          Hide
          Owen O'Malley added a comment -

          +1 to the improved fix.

          Show
          Owen O'Malley added a comment - +1 to the improved fix.
          Hide
          Doug Cutting added a comment -

          This changes Client.call()'s signature to throw InterruptedException, changing a public API in a patch for a point release. But I don't think anyone actually uses this public API, instead using the RPC api layered on top of it, so I doubt this will break anyone. Now, if a protocol method declares that it throws InterruptedException, then an RPC can be interrupted, as expected.

          TaskUmbilicalProtocol.progress() is also changed to throw InterrruptedException. This change only affects client-side code, and so is back-compatible.

          Show
          Doug Cutting added a comment - This changes Client.call()'s signature to throw InterruptedException, changing a public API in a patch for a point release. But I don't think anyone actually uses this public API, instead using the RPC api layered on top of it, so I doubt this will break anyone. Now, if a protocol method declares that it throws InterruptedException, then an RPC can be interrupted, as expected. TaskUmbilicalProtocol.progress() is also changed to throw InterrruptedException. This change only affects client-side code, and so is back-compatible.
          Hide
          Doug Cutting added a comment -

          Argh. I committed this too soon. It causes unit tests to hang. I talked to Owen & he saw the underlying problem: Client.java traps InterruptedException, so Thread.join() won't return if the thread is in an RPC when interrupted. I have a patch I will submit soon.

          Show
          Doug Cutting added a comment - Argh. I committed this too soon. It causes unit tests to hang. I talked to Owen & he saw the underlying problem: Client.java traps InterruptedException, so Thread.join() won't return if the thread is in an RPC when interrupted. I have a patch I will submit soon.
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Devaraj!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Devaraj!
          Hide
          Devaraj Das added a comment -

          This patch addresses the issue.

          Show
          Devaraj Das added a comment - This patch addresses the issue.

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Devaraj Das
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development