Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13217

Replication for HoS mapjoin small file needs to respect dfs.replication.max

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.2.1, 2.0.0
    • 2.1.0
    • Spark
    • None

    Description

      Currently Hive on Spark Mapjoin replicates small table file to a hard-coded value of 10. See SparkHashTableSinkOperator.MIN_REPLICATION.

      When dfs.replication.max is less than 10, HoS query fails. This constant should cap at dfs.replication.max.

      Normally dfs.replication.max seems set at 512.

      Attachments

        1. HIVE-13217.1.patch
          2 kB
          Chinna Rao Lalam
        2. HIVE-13217.2.patch
          2 kB
          Chinna Rao Lalam

        Activity

          People

            chinnalalam Chinna Rao Lalam
            szehon Szehon Ho
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: