Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-422

FileBasedSource needs fs snapshot update of previously failed workunits with latest snapshot

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.12.0
    • gobblin-core
    • None
    • Hide
      If the FileBasedSource initially starts processing 'source_dir' and finds 3 files underneath:

      source_dir/file_1.txt
      source_dir/file_2.txt
      source_dir/file_3.txt

      and we set:

      source.filebased.maxFilesPerRun=2

      If file_1.txt is corrupt and fails to be processed repeatedly, the workunit created for file_1.txt has the 'source.filebased.fs.snapshot' property set to file_1.txt,file_2.txt in the first run. Because of this, currently file_3.txt is repeatedly processed as long as file_1.txt fails to be processed in every run because the logic decides that file_3.txt is being processed for the first time using file_1.txt workunit's 'source.filebased.fs.snapshot' property.
      Show
      If the FileBasedSource initially starts processing 'source_dir' and finds 3 files underneath: source_dir/file_1.txt source_dir/file_2.txt source_dir/file_3.txt and we set: source.filebased.maxFilesPerRun=2 If file_1.txt is corrupt and fails to be processed repeatedly, the workunit created for file_1.txt has the 'source.filebased.fs.snapshot' property set to file_1.txt,file_2.txt in the first run. Because of this, currently file_3.txt is repeatedly processed as long as file_1.txt fails to be processed in every run because the logic decides that file_3.txt is being processed for the first time using file_1.txt workunit's 'source.filebased.fs.snapshot' property.

    Attachments

      Activity

        People

          agepati Raul A
          agepati Raul A
          Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: