Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.0
-
None
Description
Currently, stats task tries to update the statistics in the table/partition
being updated after the table/partition is loaded. In case of a failure to
update these stats (due to the any reason), the operation either succeeds
(writing inaccurate stats) or fails depending on whether hive.stats.reliable
is set to true. This can be bad for applications who do not always care about
reliable stats, since the query may have taken a long time to execute and then
fail eventually.
Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable is
set to false, and stats could not be computed correctly, the operation would
still succeed, update the stats, but set areStatsAccurate to false.
If the application cares about accurate stats, it can be obtained in the
background.