Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.0.0
-
None
-
None
-
None
Description
Currently ListHDFS tracks two properties in state management, "listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the directory property now supports expression language which means the directory being listed could dynamically change on any execution of the processor.
The processor should be changed to store state specific to the directory that was listed, for example "listing.timestamp.dir1" and "emitted.timestamp.dir1".
This would also help in a clustered scenario... currently ListHDFS has to be run on primary node only, otherwise each node will be overwriting each others state and producing unexpected results. With the above improvement, if the directory evaluated to a unique path for each node, it would store the state of each of those path separately.