Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4728

materialize expressions for window sorts vs lazy expression evaluation

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Frontend
    • Labels:

      Description

      Currently Impala uses lazy evaluation for expressions. This can result in a performance overhead when using or reusing expressions in things like a window function order by vs having the expression materialized as a projection from the underlying relation, especially if the expression is used in multiple places.

        Issue Links

          Activity

          Hide
          twmarshall Thomas Tauber-Marshall added a comment -

          commit 6cddb952cefedd373b2a1ce71a1b3cff2e774d70
          Author: Thomas Tauber-Marshall <tmarshall@cloudera.com>
          Date: Tue Jan 31 10:33:07 2017 -0800

          IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

          Previously, exprs used in sorts were evaluated lazily. This can
          potentially be bad for performance if the exprs are expensive to
          evaluate, and it can lead to crashes if the exprs are
          non-deterministic, as this violates assumptions of our sorting
          algorithm.

          This patch addresses these issues by materializing ordering exprs.
          It does so when the expr is non-deterministic (including when it
          contains a UDF, which we cannot currently know if they are
          non-deterministic), or when its cost exceeds a threshold (or the
          cost is unknown).

          Testing:

          • Added e2e tests in test_sort.py.
          • Updated planner tests.

          Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
          Reviewed-on: http://gerrit.cloudera.org:8080/6322
          Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          twmarshall Thomas Tauber-Marshall added a comment - commit 6cddb952cefedd373b2a1ce71a1b3cff2e774d70 Author: Thomas Tauber-Marshall <tmarshall@cloudera.com> Date: Tue Jan 31 10:33:07 2017 -0800 IMPALA-4731 / IMPALA-397 / IMPALA-4728 : Materialize sort exprs Previously, exprs used in sorts were evaluated lazily. This can potentially be bad for performance if the exprs are expensive to evaluate, and it can lead to crashes if the exprs are non-deterministic, as this violates assumptions of our sorting algorithm. This patch addresses these issues by materializing ordering exprs. It does so when the expr is non-deterministic (including when it contains a UDF, which we cannot currently know if they are non-deterministic), or when its cost exceeds a threshold (or the cost is unknown). Testing: Added e2e tests in test_sort.py. Updated planner tests. Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Reviewed-on: http://gerrit.cloudera.org:8080/6322 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              twmarshall Thomas Tauber-Marshall
              Reporter:
              grahn Greg Rahn
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development