Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
hadoop-2.2.0
Description
HdfsFileWriter doesn't allow us to create files in HDFS with a different replication factor than the configured DFS default because it uses:
FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);
Since we have two forms of replication going on when using HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication factor for the Solr directories to a lower value than the default. I realize this might reduce the chance of data locality but since Solr cores each have their own path in HDFS, we should give operators the option to reduce it.
My original thinking was to just use Hadoop setrep to customize the replication factor, but that's a one-time shot and doesn't affect new files created. For instance, I did:
hadoop fs -setrep -R 1 solr49/coll1
My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an example
Then added some more docs to the coll1 and did:
hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3
3 <-- should be 1
So it looks like new files don't inherit the repfact from their parent directory.
Not sure if we need to go as far as allowing different replication factor per collection but that should be considered if possible.
I looked at the Hadoop 2.2.0 code to see if there was a way to work through this using the Configuration object but nothing jumped out at me ... and the implementation for getServerDefaults(path) is just:
public FsServerDefaults getServerDefaults(Path p) throws IOException
{ return getServerDefaults(); }Path is ignored