Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
-
None
Description
DistributedCache does not check recursively if the content a directory has changed when adding files to it with DistributedCache.addCacheFile().
Background
I have an Oozie workflow on HDFS:
example_workflow ├── job.properties ├── lib │ ├── components │ │ ├── sub-component.sh │ │ └── subsub │ │ └── subsub.sh │ ├── main.sh │ └── sub.sh └── workflow.xml
Executed the workflow; then made some changes in subsub.sh. Replaced the file on HDFS. When I re-ran the workflow, DistributedCache did not notice the changes as the timestamp on the components directory did not change. As a result, the old script was materialized.
This behaviour might be related to determineTimestamps() .
In order to use the new script during workflow execution, I had to update the whole components directory.
Some more info:
In Oozie, DistributedCache.addCacheFile() is used to add files to the distributed cache.