Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4260

Alter table add column drops all the column stats

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Catalog
    • Labels:

      Description

      Adding a column after compute stats drops the per column stats, repro below

      [vd0340.halxg.cloudera.com:21000] > compute stats single_node_insert.shipdates;
      Query: compute stats single_node_insert.shipdates
      +-----------------------------------------+
      | summary                                 |
      +-----------------------------------------+
      | Updated 1 partition(s) and 4 column(s). |
      +-----------------------------------------+
      Fetched 1 row(s) in 0.72s
      [vd0340.halxg.cloudera.com:21000] > show column stats single_node_insert.shipdates;
      Query: show column stats single_node_insert.shipdates
      +--------------+--------+------------------+--------+----------+-------------------+
      | Column       | Type   | #Distinct Values | #Nulls | Max Size | Avg Size          |
      +--------------+--------+------------------+--------+----------+-------------------+
      | l_shipdate   | STRING | 2629             | -1     | 10       | 9.992899894714355 |
      | l_orderkey   | BIGINT | 1                | -1     | 8        | 8                 |
      | l_linenumber | BIGINT | 1                | -1     | 8        | 8                 |
      | l_partkey    | BIGINT | 1                | -1     | 8        | 8                 |
      +--------------+--------+------------------+--------+----------+-------------------+
      Fetched 4 row(s) in 0.01s
      [vd0340.halxg.cloudera.com:21000] > alter table shipdates add columns (l_custkey bigint);
      Query: alter table shipdates add columns (l_custkey bigint)
      [vd0340.halxg.cloudera.com:21000] > show column stats single_node_insert.shipdates;
      Query: show column stats single_node_insert.shipdates
      +--------------+--------+------------------+--------+----------+----------+
      | Column       | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
      +--------------+--------+------------------+--------+----------+----------+
      | l_shipdate   | STRING | -1               | -1     | -1       | -1       |
      | l_orderkey   | BIGINT | -1               | -1     | 8        | 8        |
      | l_linenumber | BIGINT | -1               | -1     | 8        | 8        |
      | l_partkey    | BIGINT | -1               | -1     | 8        | 8        |
      | l_custkey    | BIGINT | -1               | -1     | 8        | 8        |
      +--------------+--------+------------------+--------+----------+----------+
      

        Issue Links

          Activity

          Hide
          twmarshall Thomas Tauber-Marshall added a comment -

          commit 5cc133947fe26a9ae39b0e7afb4678d251206b6b
          Author: Thomas Tauber-Marshall <tmarshall@cloudera.com>
          Date: Mon Oct 24 15:37:22 2016 -0700

          IMPALA-4260: Alter table add column drops all the column stats

          Hive expects types for column stats to be specified as all lower
          case. For some reason, it doesn't check this when the stats are
          first written, but it does check when performing an 'alter table'.
          This causes it to drop stats that Impala wrote because we specify
          type names in upper case.

          This patch converts the types that Impala sends to Hive for the
          column stats to all lower case and adds a regression test.

          I also filed HIVE-15061 to track the issue from the Hive end.

          Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda
          Reviewed-on: http://gerrit.cloudera.org:8080/4845
          Reviewed-by: Matthew Jacobs <mj@cloudera.com>
          Tested-by: Internal Jenkins

          The corresponding fix on the Hive side: https://issues.apache.org/jira/browse/HIVE-15061
          (only one of these two patches is required to fix the problem)

          Show
          twmarshall Thomas Tauber-Marshall added a comment - commit 5cc133947fe26a9ae39b0e7afb4678d251206b6b Author: Thomas Tauber-Marshall <tmarshall@cloudera.com> Date: Mon Oct 24 15:37:22 2016 -0700 IMPALA-4260 : Alter table add column drops all the column stats Hive expects types for column stats to be specified as all lower case. For some reason, it doesn't check this when the stats are first written, but it does check when performing an 'alter table'. This causes it to drop stats that Impala wrote because we specify type names in upper case. This patch converts the types that Impala sends to Hive for the column stats to all lower case and adds a regression test. I also filed HIVE-15061 to track the issue from the Hive end. Change-Id: Ia373ec917efa7ab9f2a59b8a870b7ebc30175dda Reviewed-on: http://gerrit.cloudera.org:8080/4845 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins The corresponding fix on the Hive side: https://issues.apache.org/jira/browse/HIVE-15061 (only one of these two patches is required to fix the problem)

            People

            • Assignee:
              twmarshall Thomas Tauber-Marshall
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development