Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4854

COMPUTE INCREMENTAL STATS should ignore missing stats on complex columns

    Details

      Description

      After executing "compute incremental stats" on a table, by design, future calls to "compute incremental stats" only compute stats for partitions for which there are no statistics. However, when statistics are found to be missing for a column, e.g. a column was added to the schema since the last computation, incremental stats will be recomputed for all partitions. Impala doesn't currently compute statistics for complex columns, such as arrays and structs. Because of this, stats for these types of columns are always found to be missing, which incorrectly causes stats to be re-computed for all partitions on every run. Missing stats for complex columns on previously stat-computed partitions should be ignored when determining if re-computation is necessary, as re-computing stats will never remedy this situation.

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        commit d845413ab8fb0c92fc2d8d0c2a54d0de4dbd7429
        Author: Alex Behm <alex.behm@cloudera.com>
        Date: Wed Feb 15 19:03:47 2017 -0800

        IMPALA-4854: Fix incremental stats with complex types.

        The bug: Compute incremental stats used to always do a
        full stats recomputation for tables with complex types.
        The logic for detecting schema changes (e.g. an added
        column) did not take into consideration that columns
        with complex types are ignored in the stats computation,
        and should therefore not be recognized as a new column
        that does not yet have stats.

        Testing:

        • Added a new regression test
        • Locally ran test_compute_stats.py and the FE tests

        Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a
        Reviewed-on: http://gerrit.cloudera.org:8080/6033
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit d845413ab8fb0c92fc2d8d0c2a54d0de4dbd7429 Author: Alex Behm <alex.behm@cloudera.com> Date: Wed Feb 15 19:03:47 2017 -0800 IMPALA-4854 : Fix incremental stats with complex types. The bug: Compute incremental stats used to always do a full stats recomputation for tables with complex types. The logic for detecting schema changes (e.g. an added column) did not take into consideration that columns with complex types are ignored in the stats computation, and should therefore not be recognized as a new column that does not yet have stats. Testing: Added a new regression test Locally ran test_compute_stats.py and the FE tests Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a Reviewed-on: http://gerrit.cloudera.org:8080/6033 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            alex.behm Alexander Behm
            Reporter:
            ngsalmon Nathan Salmon
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development