Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
During the work to improve Impala column stats to compute min/max for columns, it is found that the state of unset low or high value in LongColumnStatsData can not be retrieved back. This is illustrated in the following Impala test case added to MetastoreEventsProcessorTest.
@Test public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException { try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) { List<String> colNames = new ArrayList<String>(); colNames.add("id"); colNames.add("int_col"); colNames.add("bigint_col"); List<ColumnStatisticsObj> colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( "unique_database", "alltypes", colNames, "impala"); for (ColumnStatisticsObj colStatsObj : colStatsObjs) { ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); LongColumnStatsData longColStatsData = colStatsData.getLongStats(); longColStatsData.unsetLowValue(); longColStatsData.unsetHighValue(); colStatsData.setLongStats(longColStatsData); } assertTrue("All good!", true); colStatsObjs = msClient.getHiveClient().getTableColumnStatistics( "unique_database", "alltypes", colNames, "impala"); for (ColumnStatisticsObj colStatsObj : colStatsObjs) { ColumnStatisticsData colStatsData = colStatsObj.getStatsData(); LongColumnStatsData longColStatsData = colStatsData.getLongStats(); assertFalse("isSetLowValue() should be false", longColStatsData.isSetLowValue()); assertFalse( "isSetHighValue() should be false", longColStatsData.isSetHighValue()); } assertTrue("All good!", true); } catch (NoSuchObjectException e) { assertFalse(String.format("No such object exception: %s", e), false); } catch (MetaException e) { assertFalse(String.format("Metadata exception: %s", e), false); } catch (TException e) { assertFalse(String.format("TException: %s", e), false); } }
The assertion on isSetLowValue() or isSetHighValue() should be false, since longColStatsData.unsetLowValue() is called in the first loop.
To build the test,
mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff -Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue
Table unique_database.alltypes is defined as follows.
CREATE EXTERNAL TABLE unique_database.alltypes ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_string_col STRING, string_col STRING, timestamp_col TIMESTAMP, year INT ) PARTITIONED BY ( month INT ) STORED AS PARQUET LOCATION 'hdfs://localhost:20500/test-warehouse/unique_database.db/alltypes' TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'STATS_GENERATED'='TASK', 'external.table.purge'='TRUE', 'impala.lastComputeStatsTime'='1615492819', 'numRows'='0', 'totalSize'='0')
It can be built via the following in an Impala environment.
create database if not exists unique_database; use unique_database; drop table if exists alltypes; CREATE TABLE alltypes partitioned by (month) STORED AS PARQUET as select * from functional_parquet.alltypes ;