Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9806

Multiple data load failures on HDFS errors for erasure coding builds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • Impala 4.0.0
    • None
    • Infrastructure
    • None
    • ghx-label-6

    Description

      Erasure coding build shows data load failures for TPC-H, TPC-DS and functional-query data sets, all on HDFS errors. Errors are triggered both from Hive and Impala. Pasting the failure log section for TPC-H as it is a lot shorter, but the Java backtrace for functional-query (breaking in Hive/Tez) eventually runs into the same HDFS log pattern:

      INSERT OVERWRITE TABLE tpch_parquet.region SELECT * FROM tpch.region
      Summary: Inserted 5 rows
      Success: True
      Took: 0.264951944351(s)
      Data:
      : 5
      
      ERROR: INSERT OVERWRITE TABLE tpch_parquet.orders SELECT * FROM tpch.orders
      Traceback (most recent call last):
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
          result = impala_client.execute(query)
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
          handle = self.__execute_query(query_string.strip(), user=user)
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", line 365, in __execute_query
          self.wait_for_finished(handle)
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", line 386, in wait_for_finished
          raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
      ImpalaBeeswaxException: ImpalaBeeswaxException:
       Query aborted:Failed to write data (length: 159515) to Hdfs file: hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq 
      Error(255): Unknown error 255
      Root cause: RemoteException: File /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There are 5 datanode(s) running and 5 node(s) are excluded in this operation.
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2266)
      	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
      	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
      	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
      
      
      Failed to close HDFS file: hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq
      Error(255): Unknown error 255
      Root cause: RemoteException: File /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There are 5 datanode(s) running and 5 node(s) are excluded in this operation.
      	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2266)
      	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
      	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
      	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              laszlog Laszlo Gaal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: