Hadoop Common
  1. Hadoop Common
  2. HADOOP-9655

Connection object in IPC Client can not run concurrently during connection time out

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: None
    • Component/s: ipc
    • Labels:

      Description

      When one machine power off during running a job ,MRAppMaster find tasks timed out on that host and then call stop container for each container concurrently.
      But the IPC layer did it serially, for each call,the connection time out exception toke a few minutes to raise after 45 times reties. And AM hang for many hours to wait for stopContainer to finish.
      The jstack output file shows that most threads stuck at Connection.addCall waiting for a lock object hold by Connection.setupIOstreams.
      (The setupIOstreams method run slowlly becauseof connection time out during setupconnection.)

        Activity

        Allen Wittenauer made changes -
        Labels BB2015-05-TBR
        Nemon Lou made changes -
        Summary IPC Client call to the same host with multi thread takes very long time to report connection time out for many times Connection object in IPC Client can not run concurrently during connection time out
        Nemon Lou made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Nemon Lou made changes -
        Field Original Value New Value
        Attachment HADOOP-9655.patch [ 12591789 ]
        Nemon Lou created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Nemon Lou
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development