Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
Impala 2.3.0
-
None
-
None
-
impalad version 2.3.0-cdh5-INTERNAL DEBUG (build 67eb28b2ea414cda8675c0f890fb5a2709744e5e)
Built on Sun, 13 Sep 2015 22:53:40 PST
Description
In a stress test run, 1 out of 10k queries was hung seemingly trying to process a cancellation request.
The coordinator log shows that the cancelation request was received:
I0916 06:14:44.441148 29393 coordinator.cc:315] Exec() query_id=894fbd2d1018f843:5abd801993f82586 I0916 06:14:44.441329 29393 plan-fragment-executor.cc:85] Prepare(): query_id=894fbd2d1018f843:5abd801993f82586 instance_id=894fbd2d1018f843:5abd801993f82587 I0916 06:14:44.442201 29393 coordinator.cc:391] starting 1 backends for query 894fbd2d1018f843:5abd801993f82586 I0916 06:14:48.553042 29393 impala-hs2-server.cc:980] GetResultSetMetadata(): query_id=894fbd2d1018f843:5abd801993f82586 I0916 06:14:48.664563 4758 coordinator.cc:1329] Backend 0 completed, 0 remaining: query_id=894fbd2d1018f843:5abd801993f82586 I0916 06:19:01.414476 29393 impala-hs2-server.cc:934] CancelOperation(): query_id=894fbd2d1018f843:5abd801993f82586 I0916 06:19:01.414510 29393 impala-server.cc:943] Cancel(): query_id=894fbd2d1018f843:5abd801993f82586 I0916 06:19:01.414530 29393 coordinator.cc:1153] Cancel() query_id=894fbd2d1018f843:5abd801993f82586
Here is what a successful cancelation looks like
I0916 03:49:53.439390 29409 coordinator.cc:315] Exec() query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 03:49:53.439645 29409 plan-fragment-executor.cc:85] Prepare(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a instance_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9b I0916 03:49:53.441067 29409 coordinator.cc:391] starting 1 backends for query 6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 03:50:23.228940 29409 impala-hs2-server.cc:980] GetResultSetMetadata(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 03:50:24.388921 3272 coordinator.cc:1329] Backend 0 completed, 0 remaining: query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 04:06:02.247277 29409 impala-hs2-server.cc:934] CancelOperation(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 04:06:02.247361 29409 impala-server.cc:943] Cancel(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 04:06:02.247380 29409 coordinator.cc:1153] Cancel() query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 04:06:02.249423 29409 impala-hs2-server.cc:955] CloseOperation(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 04:06:02.249439 29409 impala-server.cc:862] UnregisterQuery(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a I0916 04:06:02.249454 29409 impala-server.cc:943] Cancel(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
So "CloseOperation()" never happens.
The client log shows that the server never responds to the cancelation request. The client is still waiting for a response.
06:14:44 10931 140112503502592 DEBUG:concurrent_select[674]:Query id is 894fbd2d1018f843:5abd801993f82586 06:14:48 10931 140112757188352 DEBUG:concurrent_select[757]:Fetching result for query with id 894fbd2d1018f843:5abd801993f82586 ... 06:19:01 10931 140112757188352 DEBUG:concurrent_select[757]:Fetching result for query with id 894fbd2d1018f843:5abd801993f82586 06:19:01 10931 140112503502592 DEBUG:concurrent_select[710]:Attempting cancellation of query with id 894fbd2d1018f843:5abd801993f82586
The client code is
def _cancel(self, cursor, report): report.timed_out = True if cursor._last_operation_handle: LOG.debug("Attempting cancellation of query with id %s", op_handle_to_query_id(cursor._last_operation_handle)) cursor.cancel_operation() LOG.debug("Sent cancellation request for query with id %s", op_handle_to_query_id(cursor._last_operation_handle))
The client log never shows a "Sent cancellation..." message or any network error.
The query/plan is
SELECT COALESCE(tinyint_col_5, tinyint_col_5, tinyint_col_5) AS int_col, COALESCE(811, COALESCE(-164, 571, LAG(-429, 89) OVER (ORDER BY COALESCE(tinyint_col_5, tinyint_col_5, tinyint_col_5)))) AS int_col_1 FROM table_41 Coordinator: impala-stress-random-4.vpc.cloudera.com:22000 Plan: ---------------- Estimated Per-Host Requirements: Memory=168.00MB VCores=1 WARNING: The following tables are missing relevant table and/or column statistics. randomness.table_41 F01:PLAN FRAGMENT [UNPARTITIONED] 02:ANALYTIC | functions: lag(-429, 89, NULL) | order by: coalesce(tinyint_col_5, tinyint_col_5, tinyint_col_5) ASC | window: ROWS BETWEEN UNBOUNDED PRECEDING AND 89 PRECEDING | hosts=1 per-host-mem=unavailable | tuple-ids=3,2 row-size=3B cardinality=538443 | 03:MERGING-EXCHANGE [UNPARTITIONED] order by: coalesce(tinyint_col_5, tinyint_col_5, tinyint_col_5) ASC hosts=1 per-host-mem=unavailable tuple-ids=3 row-size=1B cardinality=538443 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=03, UNPARTITIONED] 01:SORT | order by: coalesce(tinyint_col_5, tinyint_col_5, tinyint_col_5) ASC | hosts=1 per-host-mem=8.00MB | tuple-ids=3 row-size=1B cardinality=538443 | 00:SCAN HDFS [randomness.table_41, RANDOM] partitions=1/1 files=1 size=65.08MB table stats: 538443 rows total column stats: unavailable hosts=1 per-host-mem=160.00MB tuple-ids=0 row-size=1B cardinality=538443
The logs don't show any query errors.