Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2813

query speed is every slow in Impala ,I am using CDH5.5.0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • Impala 2.3.0
    • None
    • None
    • centos6.0,CDH5.5.0

    Description

      query speed is slow in Impala,I am using CDH5.5.0,SQL statements such as:

       select domain, sum(domain_request_count) domain_request_count,sum(domain_response_count) domain_response_count from 
      h5. dfdsdb.request_response_domain_sc where cast(CONCAT(year,month,day) as int) 
      h5. between cast("20151214" as int) and cast("20151231" as int) group by domain order by domain_request_count desc limit 10
      

      In 30 seconds or so commonly, sometimes takes more than 50 seconds, the fastest time in 15 seconds.
      the table dfdsdb.request_response_domain_sc have (date) (month) (year), three partitions.Amount of data at around one hundred million.
      By definition, this statement should take under 10 seconds.I monitor the backstage implala log, found time-consuming long query background are abnormal, as follows:

      Tuple(id=0 size=40 slots=[Slot(id=0 type=STRING col_path=[4] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=1 type=BIGINT col_path=[5] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=2 type=BIGINT col_path=[6] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1), Slot(id=3 type=STRING col_path=[0] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=4 type=STRING col_path=[1] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=5 type=STRING col_path=[2] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1)] tuple_path=[])
      Tuple(id=1 size=40 slots=[Slot(id=6 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=7 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=8 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
      Tuple(id=2 size=40 slots=[Slot(id=9 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=10 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=11 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
      I0106 09:46:59.656497 19278 plan-fragment-executor.cc:303] Open(): instance_id=794f58dadaa44cb8:1f24c33dda8d00a2
      I0106 09:47:20.070286 6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over namenode1:8020. Trying to fail over immediately.
      Java exception follows:
      org.apache.hadoop.net.ConnectTimeoutException: Call From datanode to namenode1:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending namenode1:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
      at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
      at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
      at org.apache.hadoop.ipc.Client.call(Client.java:1476)
      at org.apache.hadoop.ipc.Client.call(Client.java:1403)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
      at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLojavascript:;cations(ClientNamenodeProtocolTranslatorPB.java:254)
      at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
      at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
      at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258)
      at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245)
      at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233)
      at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302)
      at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268)
      at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:260)
      at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564)
      at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308)
      at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304)
      Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=namenode2:8020]
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
      at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
      at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)
      at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
      at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
      at org.apache.hadoop.ipc.Client.call(Client.java:1442)
      ... 21 more
      I0106 09:47:20.077205 6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over namenode2:8020 after 1 fail over attempts. Trying to fail over after sleeping for 1300ms.
      Java exception follows:

      Query quickly, without the logs, I suspect that is caused by connection timeout impala query speed is slow, but, how to solve this problem? thanks

      Attachments

        Activity

          People

            Unassigned Unassigned
            liulichao lichao.liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: