Details
Description
Currently for small tables with just 1 HDFS file, the file block are potentially available on default 3 DN's. If the file content was static, we would run a hdfs dfs -setrep and then issue a refresh table/partition command to cache the newly replicated block locations. However since hdfs dfs -setrep is not synchronous, we don't really know when to issue a setrep command.
HDFS caching helps this a bit, but has the same issue that we need to do a setrep and issue the caching directive and then refresh the table to get the cached and on disk block locations into the catalog.
A good feature for small tables would be to allow HDFS replication factor be specified as part of the Impala INSERT INTO clause, (Cannot have this as part of create table options as Hive would not support this)
Attachments
Issue Links
- is related to
-
HDFS-199 add replication factor for hdfs directory
- Open