Description
Run the following hive queries
set datanucleus.cache.collections=false; set hive.stats.autogather=true; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set hive.map.aggr=true; create table tmptable(key string, value string); INSERT OVERWRITE TABLE tmptable SELECT unionsrc.key, unionsrc.value FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1 UNION ALL SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc; DESCRIBE FORMATTED tmptable;
The hive on spark prints the following table parameters:
COLUMN_STATS_ACCURATE true numFiles 2 numRows 0 rawDataSize 0 totalSize 225
The hive on mr prints the following table parameters:
able Parameters: COLUMN_STATS_ACCURATE true numFiles 2 numRows 26 rawDataSize 199 totalSize 225
As above we can see the numRows and rawDataSize are not collected by hive on spark stats
Attachments
Attachments
Issue Links
- links to