Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: NG alpha 1
    • Fix Version/s: notrack
    • Component/s: Sinks+Sources
    • Labels:

      Description

      I noticed a need on S3 sinks in flumeNG in our mailing list, should be implemented. S3 from flume 0.93 works, .94 not (some user reports).

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          1d 23h 16m 1 Alexander Alten-Lorenz 05/Mar/12 07:20
          Hide
          Alexander Alten-Lorenz added a comment -

          Eli: Flume uses HDFS abstraction for writing into S3. So we have not included a S3 sink, since you have to use HDFS's s3 terminology (http://wiki.apache.org/hadoop/AmazonS3)
          Example: s3://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/ (or similar)

          Show
          Alexander Alten-Lorenz added a comment - Eli: Flume uses HDFS abstraction for writing into S3. So we have not included a S3 sink, since you have to use HDFS's s3 terminology ( http://wiki.apache.org/hadoop/AmazonS3 ) Example: s3://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/ (or similar)
          Hide
          Eli Finkelshteyn added a comment -

          How come this is marked as resolved? There's still 0 documentation on how to get flume NG to work with s3 anywhere, and it's been months since this was initially resolved.

          Show
          Eli Finkelshteyn added a comment - How come this is marked as resolved? There's still 0 documentation on how to get flume NG to work with s3 anywhere, and it's been months since this was initially resolved.
          Mike Percy made changes -
          Link This issue relates to FLUME-1228 [ FLUME-1228 ]
          Mike Percy made changes -
          Link This issue is related to FLUME-66 [ FLUME-66 ]
          Hide
          Prashanth Jonnalagadda added a comment -

          Hello,

          flume-ng (version 1.2.0) fails while writing to S3 sink since it gets back 404 response code. The files with data is created on S3 though.

          Hadoop version used is 0.20.2-cdh3u4

          Followed all the steps documented in the jira - https://issues.cloudera.org/browse/FLUME-66
          and also I tried swapping out hadoop-core.jar that comes with CDH, with emr-hadoop-core-0.20.jar that comes with EC2 hadoop cluster instance as suggested in the following blog post - http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/ but the issue still remains.

          Following errors are seen in the log:

          2012-05-25 05:04:28,889 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995.tmp_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:28,964 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-8042215269186280519.tmp'
          2012-05-25 05:04:28,972 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' closed. Now beginning upload
          2012-05-25 05:04:29,044 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' upload complete
          2012-05-25 05:04:29,074 INFO hdfs.BucketWriter: Renaming s3n://flume-ng/flumedata/FlumeData.122585423857995.tmp to s3n://flume-ng/flumedata/FlumeData.122585423857995
          2012-05-25 05:04:29,097 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,120 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,203 WARN httpclient.RestS3Service: Response '/flumedata' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,224 WARN httpclient.RestS3Service: Response '/flumedata_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,608 INFO hdfs.BucketWriter: Creating s3n://flume-ng/flumedata/FlumeData.122585423857996.tmp
          2012-05-25 05:04:29,720 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,748 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,791 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2477068572058013384.tmp'
          2012-05-25 05:04:29,793 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' closed. Now beginning upload
          2012-05-25 05:04:29,828 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' upload complete

          Any help in this regard is highly appreciated.

          Show
          Prashanth Jonnalagadda added a comment - Hello, flume-ng (version 1.2.0) fails while writing to S3 sink since it gets back 404 response code. The files with data is created on S3 though. Hadoop version used is 0.20.2-cdh3u4 Followed all the steps documented in the jira - https://issues.cloudera.org/browse/FLUME-66 and also I tried swapping out hadoop-core.jar that comes with CDH, with emr-hadoop-core-0.20.jar that comes with EC2 hadoop cluster instance as suggested in the following blog post - http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/ but the issue still remains. Following errors are seen in the log: 2012-05-25 05:04:28,889 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995.tmp_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:28,964 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-8042215269186280519.tmp' 2012-05-25 05:04:28,972 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' closed. Now beginning upload 2012-05-25 05:04:29,044 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' upload complete 2012-05-25 05:04:29,074 INFO hdfs.BucketWriter: Renaming s3n://flume-ng/flumedata/FlumeData.122585423857995.tmp to s3n://flume-ng/flumedata/FlumeData.122585423857995 2012-05-25 05:04:29,097 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,120 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,203 WARN httpclient.RestS3Service: Response '/flumedata' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,224 WARN httpclient.RestS3Service: Response '/flumedata_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,608 INFO hdfs.BucketWriter: Creating s3n://flume-ng/flumedata/FlumeData.122585423857996.tmp 2012-05-25 05:04:29,720 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,748 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,791 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2477068572058013384.tmp' 2012-05-25 05:04:29,793 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' closed. Now beginning upload 2012-05-25 05:04:29,828 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' upload complete Any help in this regard is highly appreciated.
          Arvind Prabhakar made changes -
          Fix Version/s notrack [ 12320245 ]
          Fix Version/s v1.1.0 [ 12319284 ]
          Alexander Alten-Lorenz made changes -
          Assignee Alexander Lorenz-Alten [ alo.alt ]
          Alexander Alten-Lorenz made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Alexander Alten-Lorenz added a comment -

          flume uses hdfs abstraction, we don't need a separate configuration for. We add a notice in the upcoming guide about.

          Show
          Alexander Alten-Lorenz added a comment - flume uses hdfs abstraction, we don't need a separate configuration for. We add a notice in the upcoming guide about.
          Hide
          Alexander Alten-Lorenz added a comment -

          You're right. I'll add a line into the guide and close the jira.

          Show
          Alexander Alten-Lorenz added a comment - You're right. I'll add a line into the guide and close the jira.
          Hide
          E. Sammer added a comment -

          I'm confused. Hadoop's FileSystem abstraction supports writing to S3. Maybe we don't expose direct configuration for it, but it was never a separate sink in 0.9. Should we not use Hadoop's implementation (I'm fine with that)?

          Show
          E. Sammer added a comment - I'm confused. Hadoop's FileSystem abstraction supports writing to S3. Maybe we don't expose direct configuration for it, but it was never a separate sink in 0.9. Should we not use Hadoop's implementation (I'm fine with that)?
          Alexander Alten-Lorenz created issue -

            People

            • Assignee:
              Alexander Alten-Lorenz
              Reporter:
              Alexander Alten-Lorenz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development