Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4654

Kudu scanner threads take a long time to close

    Details

      Description

      After a recent Kudu change (somewhere between a70c90 and e018a83), we started noticing that Kudu scanner threads were taking longer to Close(). This led to some test failures (see IMPALA-4642 and IMPALA-4645), which are going to be addressed by waiting longer for fragments to complete.

      However, we need to understand why the scanner threads are taking so long.

      1. perf.hist.txt
        580 kB
        Matthew Jacobs

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        It looks like the scanner threads aren't exiting as soon as they should be. Looking into it now.

        Show
        mjacobs Matthew Jacobs added a comment - It looks like the scanner threads aren't exiting as soon as they should be. Looking into it now.
        Hide
        mjacobs Matthew Jacobs added a comment -

        I suspect there may be lingering scanner threads when scanning a large tablet as part of a query which has a limit, i.e. the limit results in the query completing early but the scan doesn't get cancelled. I suspect this because this is observed when running "select * limit 90" from a large table w/ only 3 (large) partitions, and the perf histogram above shows stacks still in the scanner appearing to be processing rows. Still have no idea how this could be related to a kudu version change, but the correlation was there.

        Show
        mjacobs Matthew Jacobs added a comment - I suspect there may be lingering scanner threads when scanning a large tablet as part of a query which has a limit, i.e. the limit results in the query completing early but the scan doesn't get cancelled. I suspect this because this is observed when running "select * limit 90" from a large table w/ only 3 (large) partitions, and the perf histogram above shows stacks still in the scanner appearing to be processing rows. Still have no idea how this could be related to a kudu version change, but the correlation was there.
        Hide
        tlipcon Todd Lipcon added a comment -

        Between those two hashes there are no client changes, so not quite sure why the client behavior would be substantially different. The server side did get the last "safe time" patch in which could cause scanner RPCs to block waiting for safetime if the scan was setting SCAN_AT_SNAPSHOT. Just to confirm, this is without the Impala-side option to enable snapshot scans, right?

        Also, can you clarify what "long time" means here? eg 0.5sec? or many seconds?

        Show
        tlipcon Todd Lipcon added a comment - Between those two hashes there are no client changes, so not quite sure why the client behavior would be substantially different. The server side did get the last "safe time" patch in which could cause scanner RPCs to block waiting for safetime if the scan was setting SCAN_AT_SNAPSHOT. Just to confirm, this is without the Impala-side option to enable snapshot scans, right? Also, can you clarify what "long time" means here? eg 0.5sec? or many seconds?
        Hide
        mjacobs Matthew Jacobs added a comment -

        It's taking a minute, but it turns out not just to close-- the scanner thread is still processing rows. The issue occurs on a query with a limit so after the limit is reached all fragments should be cancelled. In this case, the KuduScanners are not receiving the cancellation which I'm still debugging. The issue was almost certainly there before the kudu version was bumped, though for some reason it started happening in a way that became visible after the kudu version was bumped and I have no idea why. The bug is probably just with the cancellation path in Impala, which tends to be a source of bugs for us.

        You're right, this isn't running with ReadMode=SCAN_AT_SNAPSHOT, i.e. it is still READ_LATEST.

        Show
        mjacobs Matthew Jacobs added a comment - It's taking a minute, but it turns out not just to close-- the scanner thread is still processing rows. The issue occurs on a query with a limit so after the limit is reached all fragments should be cancelled. In this case, the KuduScanners are not receiving the cancellation which I'm still debugging. The issue was almost certainly there before the kudu version was bumped, though for some reason it started happening in a way that became visible after the kudu version was bumped and I have no idea why. The bug is probably just with the cancellation path in Impala, which tends to be a source of bugs for us. You're right, this isn't running with ReadMode=SCAN_AT_SNAPSHOT, i.e. it is still READ_LATEST.
        Hide
        mjacobs Matthew Jacobs added a comment -

        From debugging, I found that the KuduScanner isn't exiting when a limit is reached. It does check

            RETURN_IF_CANCELLED(state_);
        

        which makes sense, but that checks state_->is_cancelled() which returns false. The query status is CANCELLED, but it seems the status and is_cancelled_ aren't being set properly on the RuntimeState.

        Show
        mjacobs Matthew Jacobs added a comment - From debugging, I found that the KuduScanner isn't exiting when a limit is reached. It does check RETURN_IF_CANCELLED(state_); which makes sense, but that checks state_->is_cancelled() which returns false. The query status is CANCELLED, but it seems the status and is_cancelled_ aren't being set properly on the RuntimeState.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Pretty sure I found the bug and have a fix. The problem was that KuduScanner wasn't respecting the scan node's ReachedLimit(). I'll post a fix soon.

        Show
        mjacobs Matthew Jacobs added a comment - Pretty sure I found the bug and have a fix. The problem was that KuduScanner wasn't respecting the scan node's ReachedLimit(). I'll post a fix soon.
        Show
        mjacobs Matthew Jacobs added a comment - https://gerrit.cloudera.org/#/c/5493/
        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 652e7d56d9ac52a8c3d36ca4b04298d4b89897aa
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Tue Dec 13 14:57:01 2016 -0800

        IMPALA-4654: KuduScanner must return when ReachedLimit()

        Fixes a bug in the KuduScanner where the scan node's limit
        was not respected and thus the scanner thread would
        continue executing until the scan range was fully consumed.
        This could result in completed queries leaving fragments
        running and those threads could be using significant CPU and
        memory.

        For example, the query 'select * from tpch_kudu.lineitem
        limit 90' when running in the minicluster and lineitem is
        partitioned into 3 hash partitions would end up leaving a
        scanner thread running for ~60 seconds. In real world
        scenarios this can cause unexpected resource consumption.
        This could build up over time leading to query failures if
        these queries are submitted frequently.

        The fix is to ensure KuduScanner::GetNext() returns with
        eos=true when it finds ReachedLimit=true. An unnecessary and
        somewhat confusing flag 'batch_done' was being returned by a
        helper function DecodeRowsIntoRowBatch, which isn't
        necessary and was removed in order to make it more clear how
        the code in GetNext() should behave.

        Change-Id: Iaddd51111a1b2647995d68e6d37d0500b3a322de
        Reviewed-on: http://gerrit.cloudera.org:8080/5493
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 652e7d56d9ac52a8c3d36ca4b04298d4b89897aa Author: Matthew Jacobs <mj@cloudera.com> Date: Tue Dec 13 14:57:01 2016 -0800 IMPALA-4654 : KuduScanner must return when ReachedLimit() Fixes a bug in the KuduScanner where the scan node's limit was not respected and thus the scanner thread would continue executing until the scan range was fully consumed. This could result in completed queries leaving fragments running and those threads could be using significant CPU and memory. For example, the query 'select * from tpch_kudu.lineitem limit 90' when running in the minicluster and lineitem is partitioned into 3 hash partitions would end up leaving a scanner thread running for ~60 seconds. In real world scenarios this can cause unexpected resource consumption. This could build up over time leading to query failures if these queries are submitted frequently. The fix is to ensure KuduScanner::GetNext() returns with eos=true when it finds ReachedLimit=true. An unnecessary and somewhat confusing flag 'batch_done' was being returned by a helper function DecodeRowsIntoRowBatch, which isn't necessary and was removed in order to make it more clear how the code in GetNext() should behave. Change-Id: Iaddd51111a1b2647995d68e6d37d0500b3a322de Reviewed-on: http://gerrit.cloudera.org:8080/5493 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins
        Hide
        jbapple Jim Apple added a comment -

        This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09.

        2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that:

        https://github.com/apache/incubator-impala/commits/2.8.0

        Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0?

        Thank you!

        Show
        jbapple Jim Apple added a comment - This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09. 2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that: https://github.com/apache/incubator-impala/commits/2.8.0 Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0? Thank you!

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development