Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
The Utilities class has two methods, getInputSummary and getInputPaths, that use the variable mapred.dfsclient.parallelism.max to get the summary of a list of input locations in parallel. These methods are Hive related, but the variable name does not look it is specific for Hive.
Also, the above variable is not on HiveConf nor used anywhere else. I just found a reference on the Hadoop MR1 code.
I'd like to propose the deprecation of mapred.dfsclient.parallelism.max, and use a different variable name, such as hive.get.input.listing.num.threads, that reflects the intention of the variable. The removal of the old variable might happen on Hive 3.x
Attachments
Attachments
Issue Links
- links to