Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8834

Investigate enabling safe version of OPTIMIZE_PARTITION_KEY_SCANS by default

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • Backend

    Description

      Just an idea I had while updating the docs. We already have the logic in the planner to determine when a partition key scan has "distinct" semantics - i.e. you obtain the correct results so long as the scan returns 0 rows for a partition when there are no rows present, and at least one row when there is a row present (but the exact number doesn't affect correctness).

      We could push this knowledge down into the scan nodes and have them terminate early. This would be quite efficient if file handles and footers were already cached and greatly reduce the number of rows flowing through the rest of the plan.

      Attachments

        Issue Links

          Activity

            Commit e5777f0eb8b42a582afdf048e1728f3676665bb0 in impala's branch refs/heads/master from Tim Armstrong
            [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5777f0 ]

            IMPALA-8834: Short-circuit partition key scan

            This adds a new version of the pre-existing partition
            key scan optimization that always returns correct
            results, even when files have zero rows. This new
            version is always enabled by default. The old
            existing optimization, which does a metadata-only
            query, is still enabled behind the
            OPTIMIZE_PARTITION_KEY_SCANS query option.

            The new version of the optimization must scan the files
            to see if they are non-empty. Instead of using metadata
            only, the planner instructs the backend to short-circuit HDFS
            scans after a single row has been returned from each
            file. This gives results equivalent to returning all
            the rows from each file, because all rows in the file
            belong to the same partition and therefore have identical
            values for any columns that are partition key values.

            Planner cardinality estimates are adjusted accordingly
            to enable potentially better plans and other optimisations
            like disabling codegen.

            We make some effort to avoid generated extra scan ranges
            for remote scans by only generating one range per remote
            file.

            The backend optimisation is implemented by constructing a
            row batch with capacity for a single row only and then
            terminating each scan range once a single row has been
            produced. Both Parquet and ORC have optimized code paths
            for zero slot table scans that mean this will only result
            in a footer read. (Other file formats still need to read
            some portion of the file, but can terminate early once
            one row has been produced.)

            This should be quite efficient in practice with file handle
            caching and data caching enabled, because it then only
            requires reading the footer from the cache for each file.

            The partition key scan optimization is also slightly
            generalised to apply to scans of unpartitioned tables
            where no slots are materialized.

            A limitation of the optimization where it did not apply
            to multiple grouping classes was also fixed.

            Limitations:

            • This still scans every file in the partition. I.e. there is
              no short-circuiting if a row has already been found in the
              partition by the current scan node.
            • Resource reservations and estimates for the scan node do
              not all take into account this optimisation, so are
              conservative - they assume the whole file is scanned.

            Testing:

            • Added end-to-end tests that execute the query on all
              HDFS file formats and verify that the correct number of rows
              flow through the plan.
            • Added planner test based on the existing test partition key
              scan test.
            • Added test to make sure single node optimisation kicks in
              when expected.
            • Add test for cardinality estimates with and without stats
            • Added test for unpartitioned tables.
            • Added planner test that checks that optimisation is enabled
              for multiple aggregation classes.
            • Added a targeted perf test.

            Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c
            Reviewed-on: http://gerrit.cloudera.org:8080/13993
            Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
            Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

            jira-bot ASF subversion and git services added a comment - Commit e5777f0eb8b42a582afdf048e1728f3676665bb0 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5777f0 ] IMPALA-8834 : Short-circuit partition key scan This adds a new version of the pre-existing partition key scan optimization that always returns correct results, even when files have zero rows. This new version is always enabled by default. The old existing optimization, which does a metadata-only query, is still enabled behind the OPTIMIZE_PARTITION_KEY_SCANS query option. The new version of the optimization must scan the files to see if they are non-empty. Instead of using metadata only, the planner instructs the backend to short-circuit HDFS scans after a single row has been returned from each file. This gives results equivalent to returning all the rows from each file, because all rows in the file belong to the same partition and therefore have identical values for any columns that are partition key values. Planner cardinality estimates are adjusted accordingly to enable potentially better plans and other optimisations like disabling codegen. We make some effort to avoid generated extra scan ranges for remote scans by only generating one range per remote file. The backend optimisation is implemented by constructing a row batch with capacity for a single row only and then terminating each scan range once a single row has been produced. Both Parquet and ORC have optimized code paths for zero slot table scans that mean this will only result in a footer read. (Other file formats still need to read some portion of the file, but can terminate early once one row has been produced.) This should be quite efficient in practice with file handle caching and data caching enabled, because it then only requires reading the footer from the cache for each file. The partition key scan optimization is also slightly generalised to apply to scans of unpartitioned tables where no slots are materialized. A limitation of the optimization where it did not apply to multiple grouping classes was also fixed. Limitations: This still scans every file in the partition. I.e. there is no short-circuiting if a row has already been found in the partition by the current scan node. Resource reservations and estimates for the scan node do not all take into account this optimisation, so are conservative - they assume the whole file is scanned. Testing: Added end-to-end tests that execute the query on all HDFS file formats and verify that the correct number of rows flow through the plan. Added planner test based on the existing test partition key scan test. Added test to make sure single node optimisation kicks in when expected. Add test for cardinality estimates with and without stats Added test for unpartitioned tables. Added planner test that checks that optimisation is enabled for multiple aggregation classes. Added a targeted perf test. Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c Reviewed-on: http://gerrit.cloudera.org:8080/13993 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

            Commit dd36732b790574bb94ac4a4be7c819034b6b6a37 in impala's branch refs/heads/master from Tim Armstrong
            [ https://gitbox.apache.org/repos/asf?p=impala.git;h=dd36732 ]

            IMPALA-9776: Fix test failure introduced by IMPALA-8834

            Testing:

            impala-py.test tests/query_test/test_queries.py \
            --workload_exploration_strategy=functional-query:exhaustive \
            -k HdfsQueries

            Change-Id: Icff4ad909223241b2d6844aa6cc5096aa3487ecb
            Reviewed-on: http://gerrit.cloudera.org:8080/15979
            Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
            Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
            Reviewed-by: Sahil Takiar <stakiar@cloudera.com>

            jira-bot ASF subversion and git services added a comment - Commit dd36732b790574bb94ac4a4be7c819034b6b6a37 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=dd36732 ] IMPALA-9776 : Fix test failure introduced by IMPALA-8834 Testing: impala-py.test tests/query_test/test_queries.py \ --workload_exploration_strategy=functional-query:exhaustive \ -k HdfsQueries Change-Id: Icff4ad909223241b2d6844aa6cc5096aa3487ecb Reviewed-on: http://gerrit.cloudera.org:8080/15979 Tested-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Sahil Takiar <stakiar@cloudera.com>

            Commit ff7b5db6002ccb047262cd7118e2e11ab09ef40a in impala's branch refs/heads/master from zhangyifan27
            [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ff7b5db60 ]

            IMPALA-11081: Fix incorrect results in partition key scan

            This patch fixes incorrect results caused by short-circuit partition
            key scan in the case where a Parquet/ORC file contains multiple
            blocks.

            IMPALA-8834 introduced the optimization that generating only one
            scan range that corresponding to the first block per file. Backends
            only issue footer ranges for Parquet/ORC files for file-metadata-only
            queries(see HdfsScanner::IssueFooterRanges()), which leads to
            incorrect results if the first block doesn't include a file footer.
            This bug is fixed by returning a scan range corresponding to the last
            block for Parquet/ORC files to make sure it contains a file footer.

            Testing:

            • Added e2e tests to verify the fix.

            Change-Id: I17331ed6c26a747e0509dcbaf427cd52808943b1
            Reviewed-on: http://gerrit.cloudera.org:8080/19471
            Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
            Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

            jira-bot ASF subversion and git services added a comment - Commit ff7b5db6002ccb047262cd7118e2e11ab09ef40a in impala's branch refs/heads/master from zhangyifan27 [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ff7b5db60 ] IMPALA-11081 : Fix incorrect results in partition key scan This patch fixes incorrect results caused by short-circuit partition key scan in the case where a Parquet/ORC file contains multiple blocks. IMPALA-8834 introduced the optimization that generating only one scan range that corresponding to the first block per file. Backends only issue footer ranges for Parquet/ORC files for file-metadata-only queries(see HdfsScanner::IssueFooterRanges()), which leads to incorrect results if the first block doesn't include a file footer. This bug is fixed by returning a scan range corresponding to the last block for Parquet/ORC files to make sure it contains a file footer. Testing: Added e2e tests to verify the fix. Change-Id: I17331ed6c26a747e0509dcbaf427cd52808943b1 Reviewed-on: http://gerrit.cloudera.org:8080/19471 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

            Commit 794eb1ba4a6d459379dee91c4274be3f40bd16ac in impala's branch refs/heads/branch-4.1.2 from zhangyifan27
            [ https://gitbox.apache.org/repos/asf?p=impala.git;h=794eb1ba4 ]

            IMPALA-11081: Fix incorrect results in partition key scan

            This patch fixes incorrect results caused by short-circuit partition
            key scan in the case where a Parquet/ORC file contains multiple
            blocks.

            IMPALA-8834 introduced the optimization that generating only one
            scan range that corresponding to the first block per file. Backends
            only issue footer ranges for Parquet/ORC files for file-metadata-only
            queries(see HdfsScanner::IssueFooterRanges()), which leads to
            incorrect results if the first block doesn't include a file footer.
            This bug is fixed by returning a scan range corresponding to the last
            block for Parquet/ORC files to make sure it contains a file footer.

            Testing:

            • Added e2e tests to verify the fix.

            Backport Notes:

            • Trivial conflicts in HdfsScanNode.java and test_partition_metadata.py

            Change-Id: I17331ed6c26a747e0509dcbaf427cd52808943b1
            Reviewed-on: http://gerrit.cloudera.org:8080/19471
            Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
            Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

            jira-bot ASF subversion and git services added a comment - Commit 794eb1ba4a6d459379dee91c4274be3f40bd16ac in impala's branch refs/heads/branch-4.1.2 from zhangyifan27 [ https://gitbox.apache.org/repos/asf?p=impala.git;h=794eb1ba4 ] IMPALA-11081 : Fix incorrect results in partition key scan This patch fixes incorrect results caused by short-circuit partition key scan in the case where a Parquet/ORC file contains multiple blocks. IMPALA-8834 introduced the optimization that generating only one scan range that corresponding to the first block per file. Backends only issue footer ranges for Parquet/ORC files for file-metadata-only queries(see HdfsScanner::IssueFooterRanges()), which leads to incorrect results if the first block doesn't include a file footer. This bug is fixed by returning a scan range corresponding to the last block for Parquet/ORC files to make sure it contains a file footer. Testing: Added e2e tests to verify the fix. Backport Notes: Trivial conflicts in HdfsScanNode.java and test_partition_metadata.py Change-Id: I17331ed6c26a747e0509dcbaf427cd52808943b1 Reviewed-on: http://gerrit.cloudera.org:8080/19471 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: