Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.4
    • Fix Version/s: 2.0.3-alpha, 0.23.5
    • Component/s: resourcemanager
    • Labels:
      None

      Description

      we ran into a deadlock in the RM.

      =============================
      "1128743461@qtp-1252749669-5201":
      waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
      which is held by "AsyncDispatcher event handler"
      "AsyncDispatcher event handler":
      waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
      which is held by "IPC Server handler 36 on 8030"
      "IPC Server handler 36 on 8030":
      waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
      which is held by "AsyncDispatcher event handler"
      Java stack information for the threads listed above:
      ===================================================
      "1128743461@qtp-1252749669-5201":
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
        95)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
        at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
        ...
        ...
        ..

      "AsyncDispatcher event handler":
      at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)

      • waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      • locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
        at java.lang.Thread.run(Thread.java:619)
        "IPC Server handler 36 on 8030":
        at sun.misc.Unsafe.park(Native Method)
      • parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
        at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437)
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285)
      • locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
        at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56)
        at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87)
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
      1. YARN-189.patch
        3 kB
        Thomas Graves
      2. YARN-189.patch
        3 kB
        Thomas Graves

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1243 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1243/)
          YARN-189. Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431)

          Result = ABORTED
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1243 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1243/ ) YARN-189 . Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431) Result = ABORTED vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1213 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1213/)
          YARN-189. Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431)

          Result = FAILURE
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1213 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1213/ ) YARN-189 . Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #422 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/422/)
          YARN-189. Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves.
          svn merge --ignore-ancestry -c 1404431 ../../trunk/ (Revision 1404433)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404433
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #422 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/422/ ) YARN-189 . Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. svn merge --ignore-ancestry -c 1404431 ../../trunk/ (Revision 1404433) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404433 Files : /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #23 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/23/)
          YARN-189. Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #23 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/23/ ) YARN-189 . Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2947 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2947/)
          YARN-189. Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431)

          Result = SUCCESS
          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #2947 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2947/ ) YARN-189 . Fixed a deadlock between RM's ApplicationMasterService and the dispatcher. Contributed by Thomas Graves. (Revision 1404431) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404431 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
          Hide
          Vinod Kumar Vavilapalli added a comment -

          File MAPREDUCE-4761 for the MR AM issues on RM exceptions.

          Show
          Vinod Kumar Vavilapalli added a comment - File MAPREDUCE-4761 for the MR AM issues on RM exceptions.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Just committed this to trunk, branch-2 and branch-0.23. Thanks Thomas!

          Show
          Vinod Kumar Vavilapalli added a comment - Just committed this to trunk, branch-2 and branch-0.23. Thanks Thomas!
          Hide
          Vinod Kumar Vavilapalli added a comment -

          +1 for the patch. Pushing in.

          We need to fix the other AM issues immediately. sigh

          Show
          Vinod Kumar Vavilapalli added a comment - +1 for the patch. Pushing in. We need to fix the other AM issues immediately. sigh
          Hide
          Robert Joseph Evans added a comment -

          The new patch looks good too. I am +1 on this also.

          Show
          Robert Joseph Evans added a comment - The new patch looks good too. I am +1 on this also.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12551540/YARN-189.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/130//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/130//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551540/YARN-189.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/130//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/130//console This message is automatically generated.
          Hide
          Thomas Graves added a comment -

          Thanks for the review Vinod. I was not aware of the handling exception bug. I've updated the patch to do a reboot.

          Also I should have mentioned before we manually tested this patch. We reproduced the issue by introducing a sleep in the allocate, then we verified that with this patch it didn't deadlock when AM got killed or finished.

          Show
          Thomas Graves added a comment - Thanks for the review Vinod. I was not aware of the handling exception bug. I've updated the patch to do a reboot. Also I should have mentioned before we manually tested this patch. We reproduced the issue by introducing a sleep in the allocate, then we verified that with this patch it didn't deadlock when AM got killed or finished.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          If you throw an exception when the previous response isn't there, the AM might ignore it. The MR AM currently ignores exceptions upto a certain count (that's a separate bug) irrespective of the failure type. If you send a reboot command, that is better.

          Exception handling on the MR AM side is totally broken, it doesn't seem to shutdown the AM on getting critical exceptions or reboot commands from the RM - another bug.

          Show
          Vinod Kumar Vavilapalli added a comment - If you throw an exception when the previous response isn't there, the AM might ignore it. The MR AM currently ignores exceptions upto a certain count (that's a separate bug) irrespective of the failure type. If you send a reboot command, that is better. Exception handling on the MR AM side is totally broken, it doesn't seem to shutdown the AM on getting critical exceptions or reboot commands from the RM - another bug.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Thanks for the info Vinod. In an effort to keep changes to a minimum for now, we can remove the synchronized from the unregisterAttempt and in allocate just check to see if it was null when putting it.

          +1 for the idea, clearly I don't want us to hold this blocker for fixing all these, let's create a separate ticket.

          Will quickly look at the patch.

          Show
          Vinod Kumar Vavilapalli added a comment - Thanks for the info Vinod. In an effort to keep changes to a minimum for now, we can remove the synchronized from the unregisterAttempt and in allocate just check to see if it was null when putting it. +1 for the idea, clearly I don't want us to hold this blocker for fixing all these, let's create a separate ticket. Will quickly look at the patch.
          Hide
          Robert Joseph Evans added a comment -

          The change looks good to me. I don't see any real issues with it. I am a +1 for it. I am not going to check it in to give Vinod and others some time to comment if they want to.

          Show
          Robert Joseph Evans added a comment - The change looks good to me. I don't see any real issues with it. I am a +1 for it. I am not going to check it in to give Vinod and others some time to comment if they want to.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12551348/YARN-189.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/126//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/126//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551348/YARN-189.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/126//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/126//console This message is automatically generated.
          Hide
          Thomas Graves added a comment -

          Thanks for the info Vinod. In an effort to keep changes to a minimum for now, we can remove the synchronized from the unregisterAttempt and in allocate just check to see if it was null when putting it. It appears the synchronized in unregisterAttempt was just trying to prevent the memory leak of adding the response back to the responseMap after it had been unregistered.

          Show
          Thomas Graves added a comment - Thanks for the info Vinod. In an effort to keep changes to a minimum for now, we can remove the synchronized from the unregisterAttempt and in allocate just check to see if it was null when putting it. It appears the synchronized in unregisterAttempt was just trying to prevent the memory leak of adding the response back to the responseMap after it had been unregistered.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I think we should do a scrub of all the TODOs that I left in code

          Thanks for taking this up, Thomas.

          I believe we were trying to do too many things using a single underlying data structure. My objectives were:

          • Only let a single connection from the AM to do requests at a time (or in other words requests from multiple threads in an AM are forced to be serial)
          • If and whenever possible, don't let illegal AMs go through to the scheduler and other components in RM
          • Keep track of last-responses in order to account for missing messages. (This is a legacy baggage, not sure if this can happen in current RPC)

          If we can separate these functionality logically, we will be good.

          Show
          Vinod Kumar Vavilapalli added a comment - I think we should do a scrub of all the TODOs that I left in code Thanks for taking this up, Thomas. I believe we were trying to do too many things using a single underlying data structure. My objectives were: Only let a single connection from the AM to do requests at a time (or in other words requests from multiple threads in an AM are forced to be serial) If and whenever possible, don't let illegal AMs go through to the scheduler and other components in RM Keep track of last-responses in order to account for missing messages. (This is a legacy baggage, not sure if this can happen in current RPC) If we can separate these functionality logically, we will be good.

            People

            • Assignee:
              Thomas Graves
              Reporter:
              Thomas Graves
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development