Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Writing lots of delta in non-partitioned table creates runtime issues, when lot of delta folders are present.
Following code in HiveAlterHandler is invoked for every insert operation. It computes {{
updateTableStatsSlow}} for every insert causing runtime delays.
if (MetaStoreUtils.requireCalStats(null, null, newt, environmentContext) && !isPartitionedTable) { Database db = msdb.getDatabase(catName, newDbName); assert(isReplicated == HiveMetaStore.HMSHandler.isDbReplicationTarget(db)); // Update table stats. For partitioned table, we update stats in alterPartition() MetaStoreUtils.updateTableStatsSlow(db, newt, wh, false, true, environmentContext); }
It would be good to explore whether only the newly added delta can be listed for computing stats. This would avoid huge listing call during stats collection.
e.g queries to repro
CREATE TABLE IF NOT EXISTS test (name String, value int); INSERT INTO test VALUES('K1',1); INSERT INTO test VALUES('K2',2); .. .. .. INSERT INTO test VALUES('K20000',2)
Attachments
Issue Links
- links to