Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0
-
Mutation coordinator meta store dependency now optional.
-
Patch
Description
ekoifman raised a theoretical issue with the streaming mutation API (HIVE-10165) where worker nodes operating in a distributed cluster might overwhelm a meta store while trying to obtain partition locks. Although this does not happen in practice (see HIVE-11228), the API does communicate with the meta store in this manner to obtain partition paths and create new partitions. Therefore the issue described does in fact exist in the current implementation, albeit in a different code path. I’d like to make such communication optional like so:
- When the user chooses not to create partitions on demand, no meta store connection will be created in the MutationCoordinators. Additionally, partition paths will be resolved using org.apache.hadoop.hive.metastore.Warehouse.getPartitionPath(Path, LinkedHashMap<String, String>) which should be suitable so long as standard Hive partition layouts are followed.
- If the user does choose to create partitions on demand then the system will operate as is does currently; using the meta store to both issue add_partition events and look up partition meta data.
- The documentation will be updated to describe these behaviours and outline alternative approaches to collecting affected partition names and creating partitions in a less intensive manner.
Side note for follow up: The parameter names tblName and dbName seem to be the wrong way around on the method org.apache.hadoop.hive.metastore.IMetaStoreClient.getPartition(String, String, List<String>).