Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
This patch added getSerDeStats() methods to the Serializer and Deserializer interfaces. Consequently, any SerDes which were compiled against the old interfaces will need to be recompiled against the new interfaces in order to work against Hive 0.8.0.
Description
Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. Other statistics (e.g., total table/partition size) are derived from the file system.
Here, we want to collect information about the sizes of uncompressed data, to be able to determine the efficiency of compression.
Currently, a large part of statistics collection mechanism is hardcoded and not-easily extensible for other statistics.
On top of adding the new statistic collected, it would be desirable to extend the collection mechanism, so any new statistics could be added easily.