Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently, when ListGCS runs for the first time, it lists all the available files in the specified bucket directory, and when it runs in the next times, it lists the files created/modified since last run, using the state stored in the processor.
What is required is that every time the processor runs(runs after first one), to list all existing files in GCS specified directory, not the files created/modified since last run.
As I understand this is called "No Tracking" listing strategy that is available in most List* processors, but it doesn't exist in ListGCSBucket.
Without this capability, I had to build a whole extra pipeline to stop the processor–> clear the state -->start processor, before the ListGCS runs, which is prone to issues with extra work and maintenance effort.
Appreciate your help and support.
Attachments
Issue Links
- links to