Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6316

impalad crashes after hadoopZeroCopyRead failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 2.11.0
    • Not Applicable
    • None
    • None
    • ghx-label-4

    Description

      End- End tests fails
      ---------------------------
      20:00:40 [gw0] PASSED query_test/test_join_queries.py::TestJoinQueries::test_single_node_joins_with_limits_exhaustive[batch_size: 1 | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: parquet/none]
      20:04:17 query_test/test_join_queries.py::TestJoinQueries::test_single_node_joins_with_limits_exhaustive[batch_size: 1 | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: parquet/none]
      20:04:17 [gw1] FAILED query_test/test_queries.py::TestQueries::test_union[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: rc/snap/block]
      20:04:17 query_test/test_queries.py::TestQueries::test_union[exec_option:

      {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0}

      | table_format: rc/snap/block]
      20:04:17 [gw2] FAILED query_test/test_queries.py::TestQueries::test_subquery[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: seq/def/record]
      20:04:17 [gw3] FAILED query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option:

      {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0}

      | table_format: seq/def/block]
      20:04:17 query_test/test_queries.py::TestQueries::test_subquery[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: seq/def/record]
      20:04:17 [gw0] FAILED query_test/test_join_queries.py::TestJoinQueries::test_single_node_joins_with_limits_exhaustive[batch_size: 1 | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: parquet/none]

      #0 0x00000031bea328e5 in raise () from /lib64/libc.so.6
      #1 0x00000031bea340c5 in abort () from /lib64/libc.so.6
      #2 0x0000000003be91a4 in google::DumpStackTraceAndExit() ()
      #3 0x0000000003bdfc1d in google::LogMessage::Fail() ()
      #4 0x0000000003be14c2 in google::LogMessage::SendToLog() ()
      #5 0x0000000003bdf5f7 in google::LogMessage::Flush() ()
      #6 0x0000000003be2bbe in google::LogMessageFatal::~LogMessageFatal() ()
      #7 0x000000000189390a in impala::FragmentInstanceState::Close (this=0xc188ee0) at repos/Impala/be/src/runtime/fragment-instance-state.cc:315
      #8 0x0000000001890a12 in impala::FragmentInstanceState::Exec (this=0xc188ee0) at repos/Impala/be/src/runtime/fragment-instance-state.cc:95
      #9 0x00000000018797b8 in impala::QueryState::ExecFInstance (this=0x20584000, fis=0xc188ee0) at repos/Impala/be/src/runtime/query-state.cc:382
      #10 0x000000000187807a in impala::QueryState::<lambda()>::operator()(void) const (__closure=0x7fc1fafd9bc8) at repos/Impala/be/src/runtime/query-state.cc:325
      #11 0x000000000187a3f7 in boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::<lambda()>, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
      #12 0x00000000017c6ed4 in boost::function0<void>::operator() (this=0x7fc1fafd9bc0) at Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
      #13 0x0000000001abdbc9 in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7fc0cc476ab0) at repos/Impala/be/src/util/thread.cc:352
      #14 0x0000000001ac6754 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>> >::operator()<void (const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x1eec8f7c0, f=@0x1eec8f7b8, a=...) at workspace/impala-cdh5-trunk-exhaustive/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:457
      #15 0x0000000001ac6697 in boost::_bi::bind_t<void, void (const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>> > >::operator()(void) (this=0x1eec8f7b8) at workspace/impala-cdh5-trunk-exhaustive/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20
      #16 0x0000000001ac665a in boost::detail::thread_data<boost::_bi::bind_t<void, void (const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>> > > >::run(void) (this=0x1eec8f600) at workspace/impala-cdh5-trunk-exhaustive/Impala-Toolchain/boost-1.57.0-p3/include/boost/thread/detail/thread.hpp:116
      #17 0x0000000002d6966a in thread_proxy ()
      #18 0x00000031bee07851 in start_thread () from /lib64/libpthread.so.0
      #19 0x00000031beae894d in clone () from /lib64/libc.so.6

      log traces when this happened from impalad.INFO
      --------------------------------------------------------------------
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      E1208 15:03:55.125169 2169 Analyzer.java:2375] Failed to load metadata for table: alltypes
      Failed to load metadata for table: functional.alltypes. Running 'invalidate metadata functional.alltypes' may resolve this problem.
      CAUSED BY: MetaException: Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
      at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.reconnect(HiveMetaStoreClient.java:337)
      at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:98)
      at com.sun.proxy.$Proxy5.getTable(Unknown Source)
      at org.apache.impala.catalog.TableLoader.load(TableLoader.java:65)
      at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:241)
      at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:238)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.net.ConnectException: Connection refused
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
      at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
      at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
      at java.net.Socket.connect(Socket.java:579)
      at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
      ... 11 more
      Picked up JAVA_TOOL_OPTIONS: -agentlib:jdwp=transport=dt_socket,address=30000,server=y,suspend=n
      hdfsOpenFile(hdfs://localhost:20500/test-warehouse/file_open_fail/564e4332cbb6e8de-c0c5101c00000000_2005391775_data.0.): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream error:
      RemoteException: File does not exist: /test-warehouse/file_open_fail/564e4332cbb6e8de-c0c5101c00000000_2005391775_data.0.
      at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
      at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1983)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:579)
      at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:92)
      .
      .
      .
      FSDataOutputStream#close error:
      RemoteException: No lease on /test-warehouse/tpch_parquet.db/ctas_cancel/_impala_insert_staging/a14b1ee198cd7327_a46f833a00000000/.a14b1ee198cd7327-a46f833a00000002_567821133_dir/a14b1ee198cd7327-a46f833a00000002_1272243926_data.0.parq (inode 37350): File does not exist. Holder DFSClient_NONMAPREDUCE_307426671_1 does not have any open files.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3760)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3561)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3417)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
      at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /test-warehouse/tpch_parquet.db/ctas_cancel/_impala_insert_staging/a14b1ee198cd7327_a46f833a00000000/.a14b1ee198cd7327-a46f833a00000002_567821133_dir/a14b1ee198cd7327-a46f833a00000002_1272243926_data.0.parq (inode 37350): File does not exist. Holder DFSClient_NONMAPREDUCE_307426671_1 does not have any open files.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3760)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3561)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3417)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
      at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGr.
      .
      .
      FSDataOutputStream#close error:
      RemoteException: No lease on /test-warehouse/functional_parquet.db/alltypesinsert/_impala_insert_staging/3949d68930d0228e_c655177500000000/.3949d68930d0228e-c655177500000008_357133657_dir/year=2009/month=0/3949d68930d0228e-c655177500000008_24906809_data.0.parq (inode 88180): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_307426671_1, pending creates: 1]
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3760)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3561)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3417)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
      at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$E1208 17:14:10.009800 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
      tcmalloc: large alloc 2147483648 bytes == 0x2708a6000 @ 0x3d039c6 0x7fc28438ac49
      tcmalloc: large alloc 4294967296 bytes == 0x7fc0dd294000 @ 0x3d039c6 0x7fc28438ac49
      E1208 17:17:29.387645 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
      E1208 17:17:30.425915 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
      E1208 17:18:41.971148 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
      tcmalloc: large alloc 4294967296 bytes == 0x7fc0dd294000 @ 0x3d039c6 0x7fc28438ac49
      E1208 17:21:30.161092 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
      E1208 17:21:30.913319 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
      E1208 17:25:00.198657 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_978e0f35.memtest(10485760)'
      E1208 17:25:00.199533 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_978e0f35.memtest(10485760)'
      E1208 17:25:00.200562 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_978e0f35.memtest(10485760)'
      E1208 17:25:08.581363 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_ae6bd38e.memtest(10485760)'
      .
      .
      E1208 19:13:11.192692 7603 LiteralExpr.java:186] Failed to evaluate expr 'TIMESTAMP '1400-01-01 21:00:00' - INTERVAL 1 DAYS'
      E1208 19:13:11.224931 7603 LiteralExpr.java:186] Failed to evaluate expr 'TIMESTAMP '1400-01-01 21:00:00' - INTERVAL 1 DAYS'
      .
      .
      hadoopZeroCopyRead: ZeroCopyCursor#read failed error:
      ReadOnlyBufferException: java.nio.ReadOnlyBufferException
      at java.nio.DirectByteBufferR.put(DirectByteBufferR.java:344)
      at org.apache.hadoop.crypto.CryptoInputStream.decrypt(CryptoInputStream.java:53F1208 20:00:45.213917 25256 fragment-instance-state.cc:315] Check failed: other_time <= total_time + 1 (481986958 vs. 481986956)

          • Check failure stack trace: ***
            @ 0x3bdfc1d google::LogMessage::Fail()
            @ 0x3be14c2 google::LogMessage::SendToLog()
            @ 0x3bdf5f7 google::LogMessage::Flush()
            @ 0x3be2bbe google::LogMessageFatal::~LogMessageFatal()
            @ 0x189390a impala::FragmentInstanceState::Close()
            @ 0x1890a12 impala::FragmentInstanceState::Exec()
            @ 0x18797b8 impala::QueryState::ExecFInstance()
            @ 0x187807a _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
            @ 0x187a3f7 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
            @ 0x17c6ed4 boost::function0<>::operator()()
            @ 0x1abdbc9 impala::Thread::SuperviseThread()
            @ 0x1ac6754 boost::_bi::list4<>::operator()<>()
            @ 0x1ac6697 boost::_bi::bind_t<>::operator()()
            @ 0x1ac665a boost::detail::thread_data<>::run()
            @ 0x2d6966a thread_proxy
            @ 0x31bee07851 (unknown)
            @ 0x31beae894d (unknown)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pranay_singh Pranay Singh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: