Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      When I run a simple query "select count(1) from lineitem;" in one tsql session and after that run the 2nd query in another tsql session. The 2nd one hung.
      After the 1st one done, the 2nd one still stuck at 0%.

      After that, run the same query in the first tsql session, the 3rd query also hung.

      1. pstack
        596 kB
        Min Zhou
      2. TAJO-525.patch
        6 kB
        Hyoungjun Kim

        Issue Links

          Activity

          Hide
          coderplay Min Zhou added a comment -

          After a while, closed those two sessions and launched another one, got another exception

          tajo> select count(1) from lineitem ;
          26.103: [GC 26.103: [ParNew: 17981K->951K(19136K), 0.0034090 secs] 19432K->2408K(128960K), 0.0035000 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 
          Progress: 0%, response time: 1.033 sec
          Progress: 0%, response time: 2.064 sec
          Progress: 0%, response time: 3.09 sec
          Progress: 0%, response time: 4.115 sec
          Progress: 0%, response time: 5.197 sec
          Progress: 0%, response time: 6.221 sec
          Progress: 0%, response time: 7.245 sec
          Progress: 0%, response time: 8.269 sec
          Progress: 0%, response time: 9.293 sec
          Progress: 0%, response time: 10.319 sec
          36.458: [GC 36.458: [ParNew: 17975K->972K(19136K), 0.0029600 secs] 19432K->2489K(128960K), 0.0030540 secs] [Times: user=0.02 sys=0.01, real=0.00 secs] 
          Progress: 1%, response time: 11.349 sec
          Progress: 1%, response time: 12.374 sec
          Progress: 1%, response time: 13.398 sec
          Progress: 1%, response time: 14.422 sec
          Progress: 2%, response time: 15.446 sec
          Progress: 2%, response time: 16.469 sec
          Progress: 2%, response time: 17.492 sec
          Progress: 2%, response time: 18.516 sec
          Progress: 2%, response time: 19.536 sec
          Progress: 3%, response time: 20.56 sec
          Progress: 3%, response time: 21.583 sec
          Internal error!
          
          Show
          coderplay Min Zhou added a comment - After a while, closed those two sessions and launched another one, got another exception tajo> select count(1) from lineitem ; 26.103: [GC 26.103: [ParNew: 17981K->951K(19136K), 0.0034090 secs] 19432K->2408K(128960K), 0.0035000 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] Progress: 0%, response time: 1.033 sec Progress: 0%, response time: 2.064 sec Progress: 0%, response time: 3.09 sec Progress: 0%, response time: 4.115 sec Progress: 0%, response time: 5.197 sec Progress: 0%, response time: 6.221 sec Progress: 0%, response time: 7.245 sec Progress: 0%, response time: 8.269 sec Progress: 0%, response time: 9.293 sec Progress: 0%, response time: 10.319 sec 36.458: [GC 36.458: [ParNew: 17975K->972K(19136K), 0.0029600 secs] 19432K->2489K(128960K), 0.0030540 secs] [Times: user=0.02 sys=0.01, real=0.00 secs] Progress: 1%, response time: 11.349 sec Progress: 1%, response time: 12.374 sec Progress: 1%, response time: 13.398 sec Progress: 1%, response time: 14.422 sec Progress: 2%, response time: 15.446 sec Progress: 2%, response time: 16.469 sec Progress: 2%, response time: 17.492 sec Progress: 2%, response time: 18.516 sec Progress: 2%, response time: 19.536 sec Progress: 3%, response time: 20.56 sec Progress: 3%, response time: 21.583 sec Internal error!
          Hide
          coderplay Min Zhou added a comment -

          The internal error is caused by a failure of an execution block. I changed some code, and found something similar to TAJO-522

          014-01-20 01:53:04,959 ERROR master.ContainerProxy (TajoContainerProxy.java:assignExecutionBlock(88)) - java.lang.OutOfMemoryError: unable to create new native thread
          java.io.IOException: java.lang.OutOfMemoryError: unable to create new native thread
                  at org.apache.tajo.rpc.NettyClientBase.init(NettyClientBase.java:76)
                  at org.apache.tajo.rpc.AsyncRpcClient.<init>(AsyncRpcClient.java:67)
                  at org.apache.tajo.master.TajoContainerProxy.assignExecutionBlock(TajoContainerProxy.java:72)
                  at org.apache.tajo.master.TajoContainerProxy.launch(TajoContainerProxy.java:59)
                  at org.apache.tajo.worker.TajoResourceAllocator$LaunchRunner.run(TajoResourceAllocator.java:189)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Caused by: java.lang.OutOfMemoryError: unable to create new native thread
                  at java.lang.Thread.start0(Native Method)
                  at java.lang.Thread.start(Thread.java:640)
                  at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
                  at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
                  at org.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)
                  at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
                  at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:95)
                  at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:53)
                  at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
                  at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
                  at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
                  at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:99)        at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:69)
                  at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
                  at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
                  at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:151)
                  at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:116)
                  at org.apache.tajo.rpc.NettyClientBase.init(NettyClientBase.java:54)
                  ... 10 more
          

          This process ate a lot of memory for threads, I even can't launch jstack on that machine

          $ jps
          #
          # There is insufficient memory for the Java Runtime Environment to continue.
          # Cannot create GC thread. Out of system resources.
          # An error report file with more information is saved as:
          # /export/home/yarn/hs_err_pid19677.log
          

          pstack shows the process started over 1000 threads, will submit the pstack result soon.

          Show
          coderplay Min Zhou added a comment - The internal error is caused by a failure of an execution block. I changed some code, and found something similar to TAJO-522 014-01-20 01:53:04,959 ERROR master.ContainerProxy (TajoContainerProxy.java:assignExecutionBlock(88)) - java.lang.OutOfMemoryError: unable to create new native thread java.io.IOException: java.lang.OutOfMemoryError: unable to create new native thread at org.apache.tajo.rpc.NettyClientBase.init(NettyClientBase.java:76) at org.apache.tajo.rpc.AsyncRpcClient.<init>(AsyncRpcClient.java:67) at org.apache.tajo.master.TajoContainerProxy.assignExecutionBlock(TajoContainerProxy.java:72) at org.apache.tajo.master.TajoContainerProxy.launch(TajoContainerProxy.java:59) at org.apache.tajo.worker.TajoResourceAllocator$LaunchRunner.run(TajoResourceAllocator.java:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657) at org.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:95) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:53) at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45) at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45) at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28) at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:99) at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:69) at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39) at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:151) at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:116) at org.apache.tajo.rpc.NettyClientBase.init(NettyClientBase.java:54) ... 10 more This process ate a lot of memory for threads, I even can't launch jstack on that machine $ jps # # There is insufficient memory for the Java Runtime Environment to continue. # Cannot create GC thread. Out of system resources. # An error report file with more information is saved as: # /export/home/yarn/hs_err_pid19677.log pstack shows the process started over 1000 threads, will submit the pstack result soon.
          Hide
          coderplay Min Zhou added a comment - - edited

          It's terrible each execution block creates a new AsyncRpcClient, which creates a number of threads for each execution block.
          By default, netty create 1 boss thread and cores*2 io worker thread. In my case each execution block caused creating at most 1 + 24 * 2 = 49 threads!

          Show
          coderplay Min Zhou added a comment - - edited It's terrible each execution block creates a new AsyncRpcClient, which creates a number of threads for each execution block. By default, netty create 1 boss thread and cores*2 io worker thread. In my case each execution block caused creating at most 1 + 24 * 2 = 49 threads!
          Hide
          hyunsik Hyunsik Choi added a comment -

          Thank you for nice finding. It is very significant clue to find and to fix the bugs.

          Show
          hyunsik Hyunsik Choi added a comment - Thank you for nice finding. It is very significant clue to find and to fix the bugs.
          Hide
          hjkim Hyoungjun Kim added a comment - - edited

          Created a review request against branch master in reviewboard
          https://reviews.apache.org/r/17114/

          This is bug caused by opening too many connection to Worker in the QueryMaster.
          So I changed bellow.

          • Using RpcConnectionPool, when QueryMaster requests running container to Worker.
          • Configurable number of netty's client workers("tajo.rpc.client.socket-io-threads"). If this value is zero, The number of worker per socket connection is Runtime.getRuntime().availableProcessors() * 2 that is netty's default value.

          This patch is urgent patch. There is another issues to be resolved like creation ClientSocketChannelFactory per JVM.

          Show
          hjkim Hyoungjun Kim added a comment - - edited Created a review request against branch master in reviewboard https://reviews.apache.org/r/17114/ This is bug caused by opening too many connection to Worker in the QueryMaster. So I changed bellow. Using RpcConnectionPool, when QueryMaster requests running container to Worker. Configurable number of netty's client workers("tajo.rpc.client.socket-io-threads"). If this value is zero, The number of worker per socket connection is Runtime.getRuntime().availableProcessors() * 2 that is netty's default value. This patch is urgent patch. There is another issues to be resolved like creation ClientSocketChannelFactory per JVM.
          Hide
          hyunsik Hyunsik Choi added a comment -

          I think that this patch is more proper to TAJO-522. So, I moved this patch to TAJO-522.

          Show
          hyunsik Hyunsik Choi added a comment - I think that this patch is more proper to TAJO-522 . So, I moved this patch to TAJO-522 .
          Hide
          coderplay Min Zhou added a comment -

          I agree with Hyunsik.
          Thanks for the help, Hyoungjun.

          Show
          coderplay Min Zhou added a comment - I agree with Hyunsik. Thanks for the help, Hyoungjun.
          Hide
          jihoonson Jihoon Son added a comment -

          Resolved in TAJO-522.

          Show
          jihoonson Jihoon Son added a comment - Resolved in TAJO-522 .

            People

            • Assignee:
              Unassigned
              Reporter:
              coderplay Min Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development