Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12661

StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0, 1.2.1
    • 2.0.0
    • None
    • None

    Description

      PROBLEM:
      Hive stats are autogathered properly till an 'analyze table [tablename] compute statistics for columns' is run. Then it does not auto-update the stats till the command is run again. repo:

      set hive.stats.autogather=true; 
      set hive.stats.atomic=false ; 
      set hive.stats.collect.rawdatasize=true ; 
      set hive.stats.collect.scancols=false ; 
      set hive.stats.collect.tablekeys=false ; 
      set hive.stats.fetch.column.stats=true; 
      set hive.stats.fetch.partition.stats=true ; 
      set hive.stats.reliable=false ; 
      set hive.compute.query.using.stats=true; 
      
      CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 'orc.compress'='NONE') ; 
      
      insert into calendar values (2010), (2011), (2012); 
      select * from calendar; 
      +----------------+--+ 
      | calendar.year | 
      +----------------+--+ 
      | 2010 | 
      | 2011 | 
      | 2012 | 
      +----------------+--+ 
      
      select max(year) from calendar; 
      | 2012 | 
      
      insert into calendar values (2013); 
      select * from calendar; 
      +----------------+--+ 
      | calendar.year | 
      +----------------+--+ 
      | 2010 | 
      | 2011 | 
      | 2012 | 
      | 2013 | 
      +----------------+--+ 
      
      select max(year) from calendar; 
      | 2013 | 
      
      insert into calendar values (2014); 
      select max(year) from calendar; 
      | 2014 |
      
      analyze table calendar compute statistics for columns;
      
      insert into calendar values (2015);
      select max(year) from calendar;
      | 2014 |
      
      insert into calendar values (2016), (2017), (2018);
      select max(year) from calendar;
      | 2014  |
      
      analyze table calendar compute statistics for columns;
      select max(year) from calendar;
      | 2018  |
      

      Attachments

        1. HIVE-12661.final.patch
          1.63 MB
          Pengcheng Xiong
        2. HIVE-12661.12.patch
          1.62 MB
          Pengcheng Xiong
        3. HIVE-12661.11.patch
          1.58 MB
          Pengcheng Xiong
        4. HIVE-12661.10.patch
          1.55 MB
          Pengcheng Xiong
        5. HIVE-12661.09.patch
          1.55 MB
          Pengcheng Xiong
        6. HIVE-12661.08.patch
          1.54 MB
          Pengcheng Xiong
        7. HIVE-12661.07.patch
          1.56 MB
          Pengcheng Xiong
        8. HIVE-12661.06.patch
          1.38 MB
          Pengcheng Xiong
        9. HIVE-12661.05.patch
          1.36 MB
          Pengcheng Xiong
        10. HIVE-12661.04.patch
          1.11 MB
          Pengcheng Xiong
        11. HIVE-12661.03.patch
          1.08 MB
          Pengcheng Xiong
        12. HIVE-12661.02.patch
          92 kB
          Pengcheng Xiong
        13. HIVE-12661.01.patch
          49 kB
          Pengcheng Xiong

        Issue Links

          Activity

            People

              pxiong Pengcheng Xiong
              pxiong Pengcheng Xiong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: