Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-9
Description
Saw a possible haning in test_spilling_large_rows in a downstream exhaustive release build. The query didn't finish in 12mins:
query_test/test_spilling.py:83: in test_spilling_large_rows self.run_test_case('QueryTest/spilling-large-rows', vector, unique_database) common/impala_test_suite.py:665: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:603: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:923: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:205: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:205: in execute result = self.fetch_results(query_string, handle) beeswax/impala_beeswax.py:452: in fetch_results exec_result = self.__fetch_results(query_handle, max_rows) beeswax/impala_beeswax.py:463: in __fetch_results results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, fetch_rows)) beeswax/impala_beeswax.py:520: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: ImpalaBeeswaxException: E INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> E MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 720000ms)
impalad.INFO indicates it's waiting for the coordinator fragment to finish:
I0325 23:54:16.146754 16521 Frontend.java:1600] b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr db: test_spilling_large_rows_6df9f183 I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms I0325 23:54:16.147109 16521 Frontend.java:1643] b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished. I0325 23:54:16.148593 20605 admission-controller.cc:1747] b94d5cd2894cbe68:68acfd6300000000] Trying to admit id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=309.98 MB dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 max_mem=-1.00 B I0325 23:54:16.148609 20605 admission-controller.cc:1755] b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=1.00 GB, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_level_stats: num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0) I0325 23:54:16.148628 20605 admission-controller.cc:1215] b94d5cd2894cbe68:68acfd6300000000] Admitting query id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.148648 20605 impala-server.cc:2150] b94d5cd2894cbe68:68acfd6300000000] Registering query locations I0325 23:54:16.148655 20605 coordinator.cc:150] b94d5cd2894cbe68:68acfd6300000000] Exec() query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr I0325 23:54:16.148877 20605 coordinator.cc:474] b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.149272 27701 control-service.cc:148] b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 #instances=3 I0325 23:54:16.149561 20605 coordinator.cc:533] b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.149663 20609 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000002] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1 I0325 23:54:16.149741 20611 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000000] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0325 23:54:16.149821 20613 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000004] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3 I0325 23:54:16.795727 20609 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000002] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2 I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 I0325 23:54:16.968387 20613 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000004] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4 I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query id=b94d5cd2894cbe68:68acfd6300000000 execution completed I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for backends to finish, 1 remaining. query_id=b94d5cd2894cbe68:68acfd6300000000
After 12mins, it reaches the timeout limit:
I0326 00:06:54.688612 28694 status.cc:82] Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 720000ms)
@ 0xc93371
@ 0x1398edc
@ 0x138fd14
@ 0x1596d91
@ 0x159820a
@ 0x1e20f11
@ 0x7f57d79ede24
@ 0x7f57d443734c
I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): cancelling query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: Sending CancelQueryFInstances rpc
I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4
I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done:
I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends
I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control resources for query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: Not cancelling because the backend is already done: Cancelled
I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done:
I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends
This seems similar to IMPALA-10339.
Attachments
Issue Links
- relates to
-
IMPALA-10339 Apparent hang or crash in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action
- Resolved