Hadoop Common
  1. Hadoop Common
  2. HADOOP-9655

Connection object in IPC Client can not run concurrently during connection time out

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: None
    • Component/s: ipc
    • Labels:
      None

      Description

      When one machine power off during running a job ,MRAppMaster find tasks timed out on that host and then call stop container for each container concurrently.
      But the IPC layer did it serially, for each call,the connection time out exception toke a few minutes to raise after 45 times reties. And AM hang for many hours to wait for stopContainer to finish.
      The jstack output file shows that most threads stuck at Connection.addCall waiting for a lock object hold by Connection.setupIOstreams.
      (The setupIOstreams method run slowlly becauseof connection time out during setupconnection.)

        Activity

        Hide
        Nemon Lou added a comment -

        This patch has been tested on my cluster and has solved the problem.

        Show
        Nemon Lou added a comment - This patch has been tested on my cluster and has solved the problem.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12591789/HADOOP-9655.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2767//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2767//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591789/HADOOP-9655.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2767//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2767//console This message is automatically generated.
        Hide
        Nemon Lou added a comment -

        This patch use a different object for wait and notify ,so one thread invoking addCall method won't be blocked by another thread calling setupConnection method.

        Show
        Nemon Lou added a comment - This patch use a different object for wait and notify ,so one thread invoking addCall method won't be blocked by another thread calling setupConnection method.

          People

          • Assignee:
            Unassigned
            Reporter:
            Nemon Lou
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development