Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3874

Predicates are not always pushed to Kudu

    Details

      Description

      Predicates in some cases do not seem to be pushed to Kudu and are executed at the Impala scan node instead. Example from the TPC-H Q6:

      # Q6 - Forecasting Revenue Change Query
      select
       round(sum(l_extendedprice * l_discount), 2) as revenue
      from
        tpch_kudu.lineitem
      where
        l_shipdate >= '1994-01-01'
        and l_shipdate < '1995-01-01'
        and l_discount between 0.05 and 0.07
        and l_quantity < 24
      ---- PLAN
      01:AGGREGATE [FINALIZE]
      |  output: sum(l_extendedprice * l_discount)
      |
      00:SCAN KUDU [tpch_kudu.lineitem]
         *predicates: l_quantity < 24, l_shipdate < '1995-01-01'*
         kudu predicates: l_discount >= 0.05, l_discount <= 0.07, l_shipdate >= '1994-01-01'
      

        Issue Links

          Activity

          Hide
          mmokhtar Mostafa Mokhtar added a comment -

          I believe inequality predicates are not yet supported in Kudu.

          Show
          mmokhtar Mostafa Mokhtar added a comment - I believe inequality predicates are not yet supported in Kudu.
          Hide
          alex.behm Alexander Behm added a comment -

          I believe Kudu only supports inclusive ranges. We hack the predicates for exclusive ranges on discrete types like integers.

          Show
          alex.behm Alexander Behm added a comment - I believe Kudu only supports inclusive ranges. We hack the predicates for exclusive ranges on discrete types like integers.
          Hide
          dtsirogiannis Dimitris Tsirogiannis added a comment -

          Kudu seems to support only inclusive predicates for some reason.

          Show
          dtsirogiannis Dimitris Tsirogiannis added a comment - Kudu seems to support only inclusive predicates for some reason.
          Hide
          dtsirogiannis Dimitris Tsirogiannis added a comment -

          Reopening this. After talking to the Kudu team it seams that there is a mismatch between the Java and C++ API wrt the comparison operators supported, i.e. Java supports GREATER and LESS. Using the Scan token API will enable Impala to push more predicates to Kudu including non-inclusive predicates.

          Show
          dtsirogiannis Dimitris Tsirogiannis added a comment - Reopening this. After talking to the Kudu team it seams that there is a mismatch between the Java and C++ API wrt the comparison operators supported, i.e. Java supports GREATER and LESS. Using the Scan token API will enable Impala to push more predicates to Kudu including non-inclusive predicates.
          Hide
          dtsirogiannis Dimitris Tsirogiannis added a comment -

          The Kudu team just added support in the C++ client for LESS and GREATER (see https://github.com/apache/incubator-kudu/commit/7e2783c673d4559f74e205a797ae42b67ffb689c)

          Show
          dtsirogiannis Dimitris Tsirogiannis added a comment - The Kudu team just added support in the C++ client for LESS and GREATER (see https://github.com/apache/incubator-kudu/commit/7e2783c673d4559f74e205a797ae42b67ffb689c )
          Hide
          tlipcon Todd Lipcon added a comment -

          Can we bump priority up a bit here? Should be easy to fix and pushing predicates makes a big difference in many cases.

          Show
          tlipcon Todd Lipcon added a comment - Can we bump priority up a bit here? Should be easy to fix and pushing predicates makes a big difference in many cases.
          Hide
          mjacobs Matthew Jacobs added a comment -

          Todd Lipcon yeah we can do this soon

          Show
          mjacobs Matthew Jacobs added a comment - Todd Lipcon yeah we can do this soon
          Hide
          mjacobs Matthew Jacobs added a comment -

          Marking it blocker even though technically I don't think we'd really hold the release for it, but at least to keep it as a priority on our radar.

          Show
          mjacobs Matthew Jacobs added a comment - Marking it blocker even though technically I don't think we'd really hold the release for it, but at least to keep it as a priority on our radar.
          Hide
          mjacobs Matthew Jacobs added a comment -

          this will be fixed by this patch, currently in review: https://gerrit.cloudera.org/#/c/4120/

          Show
          mjacobs Matthew Jacobs added a comment - this will be fixed by this patch, currently in review: https://gerrit.cloudera.org/#/c/4120/
          Hide
          mjacobs Matthew Jacobs added a comment -

          commit 157c80056c62c89193a04d147d8c94fcb58610c4
          Author: Matthew Jacobs <mj@cloudera.com>
          Date: Fri Aug 19 08:49:25 2016 -0700

          IMPALA-3481: Use Kudu ScanToken API for scan ranges

          Switches the planner and KuduScanNode to use Kudu's new
          ScanToken API instead of explicitly constructing scan ranges
          for all tablets of a table, regardless of whether they were
          needed. The ScanToken API allows Impala to specify the
          projected columns and predicates during planning, and Kudu
          returns a set of 'scan tokens' that represent a scanner for
          each tablet that needs to be scanned. The scan tokens can
          be serialized and distributed to the scan nodes, which can
          then deserialize them into Kudu scanner objects. Upon
          deserialization, the scan token has all scan parameters
          already, including the 'pushed down' predicates. Impala no
          longer needs to send the Kudu predicates to the BE and
          convert them at the scan node.

          This change also fixes:
          1) IMPALA-4016: Avoid materializing slots only referenced
          by Kudu conjuncts
          2) IMPALA-3874: Predicates are not always pushed to Kudu

          TODO: Consider additional planning improvements.

          Testing: Updated the existing tests, verified everything
          works as expected. Some BE tests no longer make sense and
          they were removed.

          TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning.

          Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220
          Reviewed-on: http://gerrit.cloudera.org:8080/4120
          Reviewed-by: Matthew Jacobs <mj@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          mjacobs Matthew Jacobs added a comment - commit 157c80056c62c89193a04d147d8c94fcb58610c4 Author: Matthew Jacobs <mj@cloudera.com> Date: Fri Aug 19 08:49:25 2016 -0700 IMPALA-3481 : Use Kudu ScanToken API for scan ranges Switches the planner and KuduScanNode to use Kudu's new ScanToken API instead of explicitly constructing scan ranges for all tablets of a table, regardless of whether they were needed. The ScanToken API allows Impala to specify the projected columns and predicates during planning, and Kudu returns a set of 'scan tokens' that represent a scanner for each tablet that needs to be scanned. The scan tokens can be serialized and distributed to the scan nodes, which can then deserialize them into Kudu scanner objects. Upon deserialization, the scan token has all scan parameters already, including the 'pushed down' predicates. Impala no longer needs to send the Kudu predicates to the BE and convert them at the scan node. This change also fixes: 1) IMPALA-4016 : Avoid materializing slots only referenced by Kudu conjuncts 2) IMPALA-3874 : Predicates are not always pushed to Kudu TODO: Consider additional planning improvements. Testing: Updated the existing tests, verified everything works as expected. Some BE tests no longer make sense and they were removed. TODO: When KUDU-1065 is resolved, add tests that demonstrate pruning. Change-Id: I160e5849d372755748ff5ba3c90a4651c804b220 Reviewed-on: http://gerrit.cloudera.org:8080/4120 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              mjacobs Matthew Jacobs
              Reporter:
              dtsirogiannis Dimitris Tsirogiannis
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development