Flume
  1. Flume
  2. FLUME-2445

Duplicate files in s3 (both temp and final file)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.5.0
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Noticed that both temp and final file are created in S3 bucket by HDFS sink as shown below
      rw-rw-rw 1 9558423 2014-08-18 18:01 s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz
      rw-rw-rw 1 9558423 2014-08-18 18:01 s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp

      I could not find any errors in agent log. However, the agent tried to close and rename the temp file again when I tried to restart the agent next day. Even after second try, both file exists.
      Please find the logs below. File uploaded on Aug 18 and agent restarted on 19th

      $ grep actions-i-e9b26de6.1408381201580 logs/flume.log
      18 Aug 2014 17:00:01,591 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:261) - Creating s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
      18 Aug 2014 17:00:02,150 INFO [hdfs-s3sink-actions-call-runner-1] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.<init>:182) - OutputStream for key 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp' writing to tempfile '/var/lib/hadoop-hdfs/cache/ec2-user/s3/output-1521416101446161225.tmp'
      18 Aug 2014 18:01:02,535 INFO [hdfs-s3sink-actions-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$5.call:469) - Closing idle bucketWriter s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp at 1408384862535
      18 Aug 2014 18:01:02,535 INFO [hdfs-s3sink-actions-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:409) - Closing s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
      18 Aug 2014 18:01:02,535 INFO [hdfs-s3sink-actions-call-runner-7] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:217) - OutputStream for key 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp' closed. Now beginning upload
      18 Aug 2014 18:01:08,043 INFO [hdfs-s3sink-actions-call-runner-7] (org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close:229) - OutputStream for key 'flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp' upload complete
      18 Aug 2014 18:01:08,165 INFO [hdfs-s3sink-actions-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter$8.call:669) - Renaming s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp to s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz

      19 Aug 2014 19:55:37,635 INFO [conf-file-poller-0] (org.apache.flume.sink.hdfs.BucketWriter.close:409) - Closing s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
      19 Aug 2014 19:55:37,635 INFO [conf-file-poller-0] (org.apache.flume.sink.hdfs.BucketWriter.close:428) - HDFSWriter is already closed: s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp
      19 Aug 2014 19:55:38,064 INFO [hdfs-s3sink-actions-call-runner-1] (org.apache.flume.sink.hdfs.BucketWriter$8.call:669) - Renaming s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz.tmp to s3n://my-bucket/flume/actions/day=16300/hour=17/actions-i-e9b26de6.1408381201580.json.gz

        Activity

        Bijith Kumar created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Bijith Kumar
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development