Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
As of now, Crunch does not allow having different replication factor for temporary files and non-temporary files (e.g. final output data of leaf nodes) at the same time. If a user has a large amount of data (say hundreds a of gigabytes) to process, they might want to have lower replication factor for large temporary files between Crunch jobs.
We could make this configurable via a new setting (e.g. crunch.tmp.dir.replication).