Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15881

Use hive.exec.input.listing.max.threads variable name instead of mapred.dfsclient.parallelism.max

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.3.0
    • Query Planning
    • None

    Description

      The Utilities class has two methods, getInputSummary and getInputPaths, that use the variable mapred.dfsclient.parallelism.max to get the summary of a list of input locations in parallel. These methods are Hive related, but the variable name does not look it is specific for Hive.

      Also, the above variable is not on HiveConf nor used anywhere else. I just found a reference on the Hadoop MR1 code.

      I'd like to propose the deprecation of mapred.dfsclient.parallelism.max, and use a different variable name, such as hive.get.input.listing.num.threads, that reflects the intention of the variable. The removal of the old variable might happen on Hive 3.x

      Attachments

        1. HIVE-15881.1.patch
          8 kB
          Sergio Peña
        2. HIVE-15881.2.patch
          15 kB
          Sergio Peña
        3. HIVE-15881.3.patch
          15 kB
          Sergio Peña
        4. HIVE-15881.4.patch
          20 kB
          Sergio Peña
        5. HIVE-15881.5.patch
          20 kB
          Sergio Peña
        6. HIVE-15881.6.patch
          19 kB
          Sergio Peña

        Issue Links

          Activity

            People

              spena Sergio Peña
              spena Sergio Peña
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: