Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3246

Failures loading TPC-DS in Jenkins runs on EC2 machines.

    XMLWordPrintableJSON

Details

    Description

      See these jobs:
      http://sandbox.jenkins.cloudera.com/view/Impala/view/Evergreen-cdh5-trunk/job/impala-CDH5-nightly-data-load/865/
      http://sandbox.jenkins.cloudera.com/view/Impala/view/Evergreen-cdh5-trunk/job/impala-CDH5-nightly-data-load/866/

      There are several other failure sand I dug into a few of them. While the Hive logs contain common symptoms of out of space conditions, the disks appeared to have enough space in the instances I looked.

      From a hive.log (looks similar in all instances):

      2016-03-25 10:42:00,706 WARN  hdfs.DFSClient (DFSOutputStream.java:run(790)) - DataStreamer Exception
      org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test-warehouse/tpcds.store_sales/.hive-staging_hive_2016-03-25_10-41-36_555_7363988594936564481-728/_task_tmp.-ext-10000/ss_sold_date_sk=2452076/_tmp.000000_0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1595)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3287)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
      	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
      ...
      2016-03-25 10:42:01,371 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(970)) - Failed to close inode 41253
      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /test-warehouse/tpcds.store_sales/.hive-staging_hive_2016-03-25_10-41-36_555_7363988594936564481-728/_task_tmp.-ext-10000/ss_sold_date_sk=2451545/_tmp.000000_0 (inode 41253): File does not exist. Holder DFSClient_NONMAPREDUCE_-2097479428_43899 does not have any open files.
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3597)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3683)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3653)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:739)
      	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:244)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:529)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
      
      	at org.apache.hadoop.ipc.Client.call(Client.java:1471)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1408)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
      	at com.sun.proxy.$Proxy12.complete(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:448)
      	at sun.reflect.GeneratedMethodAccessor119.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
      	at com.sun.proxy.$Proxy13.complete(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2510)
      	at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2492)
      	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2455)
      	at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:967)
      	at org.apache.hadoop.hdfs.DFSClient.closeOutputStreams(DFSClient.java:999)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:986)
      	at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2832)
      	at org.apache.hadoop.fs.FileSystem.closeAllForUGI(FileSystem.java:469)
      	at org.apache.hive.service.cli.session.HiveSessionImplwithUGI.close(HiveSessionImplwithUGI.java:97)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
      	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
      	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
      	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
      	at com.sun.proxy.$Proxy17.close(Unknown Source)
      	at org.apache.hive.service.cli.session.SessionManager.closeSession(SessionManager.java:320)
      	at org.apache.hive.service.cli.CLIService.closeSession(CLIService.java:221)
      	at org.apache.hive.service.cli.thrift.ThriftCLIService.CloseSession(ThriftCLIService.java:467)
      	at org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1273)
      	at org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1258)
      	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
      	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
      	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
      	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      2016-03-25 10:42:01,409 INFO  thrift.ThriftCLIService (ThriftCLIService.java:CloseSession(468)) - Closed a session, current sessions: 0
      

      Attachments

        1. dfsadmin-report-before-failure.txt
          2 kB
          Alexander Behm
        2. disk-usage.log
          1.71 MB
          casey
        3. full-data-load-disk-usage.log
          1.15 MB
          casey

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: