Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6367

FetchS3Processor responds to md5 error on download by doing download again, again, and again

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.7.1
    • 1.10.0
    • Core Framework
    • None
    • NIFI (CentOS 7.2) with FetchS3Object running towards S3 enviroment (non public). Enviroment / S3 had errors that introduced md5 errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the input que of the processor.
    • Important

    Description

      (6months old, but don't see changes in the relevant parts of the code, though I might be mistaken. This might be hard to replicate, so suggest a code wizard check if this is still a problem. )

      Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non public). The enviroment and S3 had in combination hardware errors that resulted in sporadic md5 errors on the same files over and over again. Md5 errors resulted in an unhandled AmazonClientException, and the file was downloaded yet again. (Reverted to the input que, first in line.) In our case this was identified after a number of days, with substantial bandwidth usage. It did not help that the FetchS3Objects where running with multiple instances, and after days accumulated the bad md5 checksum files for continuous download.

      Suggest: Someone code savy check what happens to files that are downloaded with bad md5, if they are reverted to the que due to uncought exception or other means, then this is still a potential problem.

      Attachments

        Issue Links

          Activity

            People

              evanthx Evan Reynolds
              kefevs Kefevs Pirkibo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m