Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5442

Allow HDFS replication factor to be set

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.8.0
    • None
    • Frontend
    • ghx-label-2

    Description

      Currently for small tables with just 1 HDFS file, the file block are potentially available on default 3 DN's. If the file content was static, we would run a hdfs dfs -setrep and then issue a refresh table/partition command to cache the newly replicated block locations. However since hdfs dfs -setrep is not synchronous, we don't really know when to issue a setrep command.

      HDFS caching helps this a bit, but has the same issue that we need to do a setrep and issue the caching directive and then refresh the table to get the cached and on disk block locations into the catalog.

      A good feature for small tables would be to allow HDFS replication factor be specified as part of the Impala INSERT INTO clause, (Cannot have this as part of create table options as Hive would not support this)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              myloginid@gmail.com Manish Maheshwari
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: