Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
There have been several people asking for this capability. Currently, when we do a file listing, it's placed into a HashSet, so there is no ordering for how we pull the files in. My proposal is that we instead order the files such that we pull the oldest file first and keep track of the latest timestamp that we've pulled in. This way on restart we can resume where we left off.
I would create a FileOutputStream and keep it open. Write out the timestamp each time we pull data in. Then periodically flush the data to disk. Perhaps every second or so - maybe this should be configurable. We need a tradeoff between how much possible duplication we get and how much time we spend persisting the timestamp.
Attachments
Issue Links
- is superceded by
-
NIFI-631 Create ListFile and FetchFile processors
- Resolved