Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.0, 1.2.1
-
None
-
None
Description
PROBLEM:
Hive stats are autogathered properly till an 'analyze table [tablename] compute statistics for columns' is run. Then it does not auto-update the stats till the command is run again. repo:
set hive.stats.autogather=true; set hive.stats.atomic=false ; set hive.stats.collect.rawdatasize=true ; set hive.stats.collect.scancols=false ; set hive.stats.collect.tablekeys=false ; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true ; set hive.stats.reliable=false ; set hive.compute.query.using.stats=true; CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 'orc.compress'='NONE') ; insert into calendar values (2010), (2011), (2012); select * from calendar; +----------------+--+ | calendar.year | +----------------+--+ | 2010 | | 2011 | | 2012 | +----------------+--+ select max(year) from calendar; | 2012 | insert into calendar values (2013); select * from calendar; +----------------+--+ | calendar.year | +----------------+--+ | 2010 | | 2011 | | 2012 | | 2013 | +----------------+--+ select max(year) from calendar; | 2013 | insert into calendar values (2014); select max(year) from calendar; | 2014 | analyze table calendar compute statistics for columns; insert into calendar values (2015); select max(year) from calendar; | 2014 | insert into calendar values (2016), (2017), (2018); select max(year) from calendar; | 2014 | analyze table calendar compute statistics for columns; select max(year) from calendar; | 2018 |
Attachments
Attachments
Issue Links
- duplicates
-
HIVE-13147 COLUMN_STATS_ACCURATE is not accurate
- Resolved
- is related to
-
HIVE-3917 Support noscan operation for analyze command
- Closed
- links to