Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
Run the following queries and you will see the raw data for the table is 4 (that is the number of fields) incorrectly. We need to populate correct data size so data can be split properly.
SET hive.stats.autogather=true; CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET; INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1'); DESC FORMATTED parquet_stats;
Table Parameters: COLUMN_STATS_ACCURATE true numFiles 1 numRows 2 rawDataSize 4 totalSize 373 transient_lastDdlTime 1530660523
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-20523 Improve table statistics for Parquet format
- Resolved
- is related to
-
HIVE-21284 StatsWork should use footer scan for Parquet
- Closed
-
HIVE-16887 Parquet rawDataSize Under Reported
- Open