Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5442

Allow HDFS replication factor to be set

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:
      None
    • Epic Color:
      ghx-label-2

      Description

      Currently for small tables with just 1 HDFS file, the file block are potentially available on default 3 DN's. If the file content was static, we would run a hdfs dfs -setrep and then issue a refresh table/partition command to cache the newly replicated block locations. However since hdfs dfs -setrep is not synchronous, we don't really know when to issue a setrep command.

      HDFS caching helps this a bit, but has the same issue that we need to do a setrep and issue the caching directive and then refresh the table to get the cached and on disk block locations into the catalog.

      A good feature for small tables would be to allow HDFS replication factor be specified as part of the Impala INSERT INTO clause, (Cannot have this as part of create table options as Hive would not support this)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                myloginid@gmail.com Manish Maheshwari
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: