Description
If a user attempts to run MSCK REPAIR TABLE on a directory with a large number of untracked partitions HMS will OOME. I suspect this is because it attempts to do one large bulk load in an effort to save time. Ultimately this can lead to a collection so large in size that HMS eventually hits an Out of Memory Exception.
Instead I suggest that Hive include a configurable batch size that HMS can use to break up the load.
Attachments
Attachments
Issue Links
- Is contained by
-
HIVE-12859 MSCK Repair table gives error for higher number of partitions
- Open
- is related to
-
HIVE-14571 Document configuration hive.msck.repair.batch.size
- Resolved
- relates to
-
HIVE-14693 Some partitions will be left out when partition number is a multiple of the option hive.msck.repair.batch.size
- Closed