ListHDFS lists files and saves the latest listed files' modified time - latestTimestampListed and latestTimestampEmitted in the `StateMap`. It is overriding `onPropertyModified` to check if the `Directory` or the `File Filter` has been modified and if they are indeed modified, it will reset the statemap variables to `-1L` so as to list all the files from the updated `Directory` or according to the updated `File Filter`. However it is not working as intended.
- Create two directories in HDFS
- > hdfs dfs -mkdir /test1
- > hdfs dfs -mkdir /test2
- Write files to the above directories in the following order:
- > hdfs dfs -put sample.txt /test1/t1_1.txt
- > hdfs dfs -put sample.txt /test2/t2_1.txt
- > hdfs dfs -put sample.txt /test1/t1_2.txt
- Configure ListHDFS and set Directory to /test1 and start the processor. It will produce two flowfiles: t1_1.txt and t1_2.txt
- Stop the processor. Configure and set Directory to /test2. Ideally the state variables (listed and emitted timestamp) should be reset and they should list the file t2_1.txt but it is not.
- Now put one more file to test2:
- > hdfs dfs -put sample.txt /test2/2_2.txt
- This would have listed the file t2_2.txt. File t2_1.txt is missed
Little debugging helped me found that the `onPropertyModified` indeed works as intended but somewhere else the code still reads the last saved state i.e. the modified time of /test1/t1_2.txt