Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13217

Replication for HoS mapjoin small file needs to respect dfs.replication.max

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.1, 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: Spark
    • Labels:
      None

      Description

      Currently Hive on Spark Mapjoin replicates small table file to a hard-coded value of 10. See SparkHashTableSinkOperator.MIN_REPLICATION.

      When dfs.replication.max is less than 10, HoS query fails. This constant should cap at dfs.replication.max.

      Normally dfs.replication.max seems set at 512.

        Attachments

        1. HIVE-13217.1.patch
          2 kB
          Chinna Rao Lalam
        2. HIVE-13217.2.patch
          2 kB
          Chinna Rao Lalam

          Activity

            People

            • Assignee:
              chinnalalam Chinna Rao Lalam
              Reporter:
              szehon Szehon Ho
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: