Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15881

Use hive.exec.input.listing.max.threads variable name instead of mapred.dfsclient.parallelism.max

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: Query Planning
    • Labels:
      None

      Description

      The Utilities class has two methods, getInputSummary and getInputPaths, that use the variable mapred.dfsclient.parallelism.max to get the summary of a list of input locations in parallel. These methods are Hive related, but the variable name does not look it is specific for Hive.

      Also, the above variable is not on HiveConf nor used anywhere else. I just found a reference on the Hadoop MR1 code.

      I'd like to propose the deprecation of mapred.dfsclient.parallelism.max, and use a different variable name, such as hive.get.input.listing.num.threads, that reflects the intention of the variable. The removal of the old variable might happen on Hive 3.x

        Attachments

        1. HIVE-15881.6.patch
          19 kB
          Sergio Peña
        2. HIVE-15881.5.patch
          20 kB
          Sergio Peña
        3. HIVE-15881.4.patch
          20 kB
          Sergio Peña
        4. HIVE-15881.3.patch
          15 kB
          Sergio Peña
        5. HIVE-15881.2.patch
          15 kB
          Sergio Peña
        6. HIVE-15881.1.patch
          8 kB
          Sergio Peña

          Issue Links

            Activity

              People

              • Assignee:
                spena Sergio Peña
                Reporter:
                spena Sergio Peña
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: