Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1399

TajoResourceAllocator might hang on network error

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: RPC
    • Labels:
      None

      Description

      CallFuture<WorkerResourceAllocationResponse> callBack = new CallFuture<WorkerResourceAllocationResponse>();
      
      ...
      
      RpcConnectionPool connPool = RpcConnectionPool.getPool();
      NettyClientBase tmClient = null;
      try {
        ServiceTracker serviceTracker = queryTaskContext.getQueryMasterContext().getWorkerContext().getServiceTracker();
        tmClient = connPool.getConnection(serviceTracker.getUmbilicalAddress(), QueryCoordinatorProtocol.class, true);
        QueryCoordinatorProtocolService masterClientService = tmClient.getStub();
        masterClientService.allocateWorkerResources(null, request, callBack);
      } catch (Throwable e) {
        LOG.error(e.getMessage(), e);
      } finally {
        connPool.releaseConnection(tmClient);
      }
      
      WorkerResourceAllocationResponse response = null;
      while(!stopped.get()) {
        try {
          response = callBack.get(3, TimeUnit.SECONDS);
          ...
      

      If "callBack" is not registered properly in netty by failed connection, etc., allocator thread would block on empty future forever, possibly making thread leakage.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user navis opened a pull request:

          https://github.com/apache/tajo/pull/420

          TAJO-1399 TajoResourceAllocator might hang on network error

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/navis/tajo TAJO-1399

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/420.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #420


          commit 65c5c4e5ca079bddc238b0eeacae34726f696957
          Author: navis.ryu <navis@apache.org>
          Date: 2015-03-13T04:47:27Z

          TAJO-1399 TajoResourceAllocator might hang on network error


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user navis opened a pull request: https://github.com/apache/tajo/pull/420 TAJO-1399 TajoResourceAllocator might hang on network error You can merge this pull request into a Git repository by running: $ git pull https://github.com/navis/tajo TAJO-1399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #420 commit 65c5c4e5ca079bddc238b0eeacae34726f696957 Author: navis.ryu <navis@apache.org> Date: 2015-03-13T04:47:27Z TAJO-1399 TajoResourceAllocator might hang on network error
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dongjoon-hyun commented on the pull request:

          https://github.com/apache/tajo/pull/420#issuecomment-78805177

          +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dongjoon-hyun commented on the pull request: https://github.com/apache/tajo/pull/420#issuecomment-78805177 +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on the pull request:

          https://github.com/apache/tajo/pull/420#issuecomment-82167304

          I've misunderstood some part of codes and fixed that. And also amended resource leakage on timeout of allocating query master resource.

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on the pull request: https://github.com/apache/tajo/pull/420#issuecomment-82167304 I've misunderstood some part of codes and fixed that. And also amended resource leakage on timeout of allocating query master resource.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on the pull request:

          https://github.com/apache/tajo/pull/420#issuecomment-82173119

          Confirmed all test passing in local environment.

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on the pull request: https://github.com/apache/tajo/pull/420#issuecomment-82173119 Confirmed all test passing in local environment.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/420#issuecomment-97003045

          This issue was solve by TAJO-1563
          Could you test hangs on network error?

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/420#issuecomment-97003045 This issue was solve by TAJO-1563 Could you test hangs on network error?
          Hide
          jhkim Jinho Kim added a comment -

          Solved by TAJO-1563, TAJO-1584

          Show
          jhkim Jinho Kim added a comment - Solved by TAJO-1563 , TAJO-1584
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/420#issuecomment-99730357

          Solved by TAJO-1563, TAJO-1584
          please close this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/420#issuecomment-99730357 Solved by TAJO-1563 , TAJO-1584 please close this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/420

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/420

            People

            • Assignee:
              navis Navis
              Reporter:
              navis Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development