Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.11.0
-
None
-
Reviewed
Description
hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes
If table/partition is big, the operation would take time since it will open all files and scan all data.
It would be nice to support fast operation to gather statistics which doesn't require to open all files:
1. Number of files
2. Size in Bytes
Potential syntax is
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];
In the future, all statistics without scan can be retrieved via this optional parameter.
Attachments
Attachments
Issue Links
- is depended upon by
-
HIVE-3958 support partial scan for analyze command - RCFile
- Closed
- is related to
-
HIVE-1361 table/partition level statistics
- Closed
-
HIVE-33 [Hive]: Add optimizer statistics in Hive
- Resolved
- relates to
-
HIVE-12661 StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly
- Closed
-
HIVE-3954 flag indecating statistics is stale
- Open