Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23796

Multiple insert overwrite into a partitioned table doesn't gather column statistics for all partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Statistics
    • None
    • Hive 3.1

    Description

      Here I used a simplified sample to illustrate the issue.
      When there are multiple insert overwrite clauses, only the partitions related to the last clause will have column statistics. In the sample here, only the partition (ss_sold_date_sk=_HIVE_DEFAULT_PARTITION_) has column statistics, which is related to the last insert clause.

      With "hive.stats.column.autogather", by default, is true, we expect that all the partitions' column statistics should be calculated.

      create table web_sales
      (
          ws_sold_time_sk           bigint,
          ws_ship_date_sk           bigint,
          ws_item_sk                bigint
      )
      partitioned by (ws_sold_date_sk           bigint)
      stored as orc;
      from anotherdb.web_sales ws
      insert overwrite table web_sales partition (ws_sold_date_sk)
      select
              ws.ws_sold_time_sk,
              ws.ws_ship_date_sk,
              ws.ws_item_sk,
              ws.ws_sold_date_sk
              where ws.ws_sold_date_sk is not null
      insert overwrite table web_sales partition (ws_sold_date_sk)
      select
              ws.ws_sold_time_sk,
              ws.ws_ship_date_sk,
              ws.ws_item_sk,
              ws.ws_sold_date_sk
              where ws.ws_sold_date_sk is null
              sort by ws.ws_sold_date_sk
      ;
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            hsnusonic Yu-Wen Lai
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: