Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2353

Cancel request over HS2 may hang

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • Impala 2.3.0
    • None
    • None
    • impalad version 2.3.0-cdh5-INTERNAL DEBUG (build 67eb28b2ea414cda8675c0f890fb5a2709744e5e)
      Built on Sun, 13 Sep 2015 22:53:40 PST

    Description

      In a stress test run, 1 out of 10k queries was hung seemingly trying to process a cancellation request.

      The coordinator log shows that the cancelation request was received:

      I0916 06:14:44.441148 29393 coordinator.cc:315] Exec() query_id=894fbd2d1018f843:5abd801993f82586
      I0916 06:14:44.441329 29393 plan-fragment-executor.cc:85] Prepare(): query_id=894fbd2d1018f843:5abd801993f82586 instance_id=894fbd2d1018f843:5abd801993f82587
      I0916 06:14:44.442201 29393 coordinator.cc:391] starting 1 backends for query 894fbd2d1018f843:5abd801993f82586
      I0916 06:14:48.553042 29393 impala-hs2-server.cc:980] GetResultSetMetadata(): query_id=894fbd2d1018f843:5abd801993f82586
      I0916 06:14:48.664563  4758 coordinator.cc:1329] Backend 0 completed, 0 remaining: query_id=894fbd2d1018f843:5abd801993f82586
      I0916 06:19:01.414476 29393 impala-hs2-server.cc:934] CancelOperation(): query_id=894fbd2d1018f843:5abd801993f82586
      I0916 06:19:01.414510 29393 impala-server.cc:943] Cancel(): query_id=894fbd2d1018f843:5abd801993f82586
      I0916 06:19:01.414530 29393 coordinator.cc:1153] Cancel() query_id=894fbd2d1018f843:5abd801993f82586
      

      Here is what a successful cancelation looks like

      I0916 03:49:53.439390 29409 coordinator.cc:315] Exec() query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 03:49:53.439645 29409 plan-fragment-executor.cc:85] Prepare(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a instance_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9b
      I0916 03:49:53.441067 29409 coordinator.cc:391] starting 1 backends for query 6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 03:50:23.228940 29409 impala-hs2-server.cc:980] GetResultSetMetadata(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 03:50:24.388921  3272 coordinator.cc:1329] Backend 0 completed, 0 remaining: query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 04:06:02.247277 29409 impala-hs2-server.cc:934] CancelOperation(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 04:06:02.247361 29409 impala-server.cc:943] Cancel(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 04:06:02.247380 29409 coordinator.cc:1153] Cancel() query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 04:06:02.249423 29409 impala-hs2-server.cc:955] CloseOperation(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 04:06:02.249439 29409 impala-server.cc:862] UnregisterQuery(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      I0916 04:06:02.249454 29409 impala-server.cc:943] Cancel(): query_id=6a4bbd2c2a6aec84:3cd1dd159ad1ed9a
      

      So "CloseOperation()" never happens.

      The client log shows that the server never responds to the cancelation request. The client is still waiting for a response.

      06:14:44 10931 140112503502592 DEBUG:concurrent_select[674]:Query id is 894fbd2d1018f843:5abd801993f82586
      06:14:48 10931 140112757188352 DEBUG:concurrent_select[757]:Fetching result for query with id 894fbd2d1018f843:5abd801993f82586
      ...
      06:19:01 10931 140112757188352 DEBUG:concurrent_select[757]:Fetching result for query with id 894fbd2d1018f843:5abd801993f82586
      06:19:01 10931 140112503502592 DEBUG:concurrent_select[710]:Attempting cancellation of query with id 894fbd2d1018f843:5abd801993f82586
      

      The client code is

        def _cancel(self, cursor, report):
          report.timed_out = True
          if cursor._last_operation_handle:
            LOG.debug("Attempting cancellation of query with id %s",
                op_handle_to_query_id(cursor._last_operation_handle))
            cursor.cancel_operation()
            LOG.debug("Sent cancellation request for query with id %s",
                op_handle_to_query_id(cursor._last_operation_handle))
      

      The client log never shows a "Sent cancellation..." message or any network error.

      The query/plan is

      SELECT
      COALESCE(tinyint_col_5, tinyint_col_5, tinyint_col_5) AS int_col,
      COALESCE(811, COALESCE(-164, 571, LAG(-429, 89) OVER (ORDER BY COALESCE(tinyint_col_5, tinyint_col_5, tinyint_col_5)))) AS int_col_1
      FROM table_41
          Coordinator: impala-stress-random-4.vpc.cloudera.com:22000
          Plan: 
      ----------------
      Estimated Per-Host Requirements: Memory=168.00MB VCores=1
      WARNING: The following tables are missing relevant table and/or column statistics.
      randomness.table_41
      
      F01:PLAN FRAGMENT [UNPARTITIONED]
        02:ANALYTIC
        |  functions: lag(-429, 89, NULL)
        |  order by: coalesce(tinyint_col_5, tinyint_col_5, tinyint_col_5) ASC
        |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND 89 PRECEDING
        |  hosts=1 per-host-mem=unavailable
        |  tuple-ids=3,2 row-size=3B cardinality=538443
        |
        03:MERGING-EXCHANGE [UNPARTITIONED]
           order by: coalesce(tinyint_col_5, tinyint_col_5, tinyint_col_5) ASC
           hosts=1 per-host-mem=unavailable
           tuple-ids=3 row-size=1B cardinality=538443
      
      F00:PLAN FRAGMENT [RANDOM]
        DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=03, UNPARTITIONED]
        01:SORT
        |  order by: coalesce(tinyint_col_5, tinyint_col_5, tinyint_col_5) ASC
        |  hosts=1 per-host-mem=8.00MB
        |  tuple-ids=3 row-size=1B cardinality=538443
        |
        00:SCAN HDFS [randomness.table_41, RANDOM]
           partitions=1/1 files=1 size=65.08MB
           table stats: 538443 rows total
           column stats: unavailable
           hosts=1 per-host-mem=160.00MB
           tuple-ids=0 row-size=1B cardinality=538443
      

      The logs don't show any query errors.

      Attachments

        Activity

          People

            kwho Michael Ho
            caseyc casey
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: