Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: NG alpha 1
    • Fix Version/s: notrack
    • Component/s: Sinks+Sources
    • Labels:

      Description

      I noticed a need on S3 sinks in flumeNG in our mailing list, should be implemented. S3 from flume 0.93 works, .94 not (some user reports).

        Issue Links

          Activity

          Hide
          Alexander Alten-Lorenz added a comment -

          Eli: Flume uses HDFS abstraction for writing into S3. So we have not included a S3 sink, since you have to use HDFS's s3 terminology (http://wiki.apache.org/hadoop/AmazonS3)
          Example: s3://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/ (or similar)

          Show
          Alexander Alten-Lorenz added a comment - Eli: Flume uses HDFS abstraction for writing into S3. So we have not included a S3 sink, since you have to use HDFS's s3 terminology ( http://wiki.apache.org/hadoop/AmazonS3 ) Example: s3://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/ (or similar)
          Hide
          Eli Finkelshteyn added a comment -

          How come this is marked as resolved? There's still 0 documentation on how to get flume NG to work with s3 anywhere, and it's been months since this was initially resolved.

          Show
          Eli Finkelshteyn added a comment - How come this is marked as resolved? There's still 0 documentation on how to get flume NG to work with s3 anywhere, and it's been months since this was initially resolved.
          Mike Percy made changes -
          Link This issue relates to FLUME-1228 [ FLUME-1228 ]
          Mike Percy made changes -
          Link This issue is related to FLUME-66 [ FLUME-66 ]
          Hide
          Prashanth Jonnalagadda added a comment -

          Hello,

          flume-ng (version 1.2.0) fails while writing to S3 sink since it gets back 404 response code. The files with data is created on S3 though.

          Hadoop version used is 0.20.2-cdh3u4

          Followed all the steps documented in the jira - https://issues.cloudera.org/browse/FLUME-66
          and also I tried swapping out hadoop-core.jar that comes with CDH, with emr-hadoop-core-0.20.jar that comes with EC2 hadoop cluster instance as suggested in the following blog post - http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/ but the issue still remains.

          Following errors are seen in the log:

          2012-05-25 05:04:28,889 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995.tmp_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:28,964 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-8042215269186280519.tmp'
          2012-05-25 05:04:28,972 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' closed. Now beginning upload
          2012-05-25 05:04:29,044 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' upload complete
          2012-05-25 05:04:29,074 INFO hdfs.BucketWriter: Renaming s3n://flume-ng/flumedata/FlumeData.122585423857995.tmp to s3n://flume-ng/flumedata/FlumeData.122585423857995
          2012-05-25 05:04:29,097 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,120 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,203 WARN httpclient.RestS3Service: Response '/flumedata' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,224 WARN httpclient.RestS3Service: Response '/flumedata_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,608 INFO hdfs.BucketWriter: Creating s3n://flume-ng/flumedata/FlumeData.122585423857996.tmp
          2012-05-25 05:04:29,720 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,748 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp_%24folder%24' - Unexpected response code 404, expected 200
          2012-05-25 05:04:29,791 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2477068572058013384.tmp'
          2012-05-25 05:04:29,793 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' closed. Now beginning upload
          2012-05-25 05:04:29,828 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' upload complete

          Any help in this regard is highly appreciated.

          Show
          Prashanth Jonnalagadda added a comment - Hello, flume-ng (version 1.2.0) fails while writing to S3 sink since it gets back 404 response code. The files with data is created on S3 though. Hadoop version used is 0.20.2-cdh3u4 Followed all the steps documented in the jira - https://issues.cloudera.org/browse/FLUME-66 and also I tried swapping out hadoop-core.jar that comes with CDH, with emr-hadoop-core-0.20.jar that comes with EC2 hadoop cluster instance as suggested in the following blog post - http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/ but the issue still remains. Following errors are seen in the log: 2012-05-25 05:04:28,889 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995.tmp_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:28,964 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-8042215269186280519.tmp' 2012-05-25 05:04:28,972 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' closed. Now beginning upload 2012-05-25 05:04:29,044 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857995.tmp' upload complete 2012-05-25 05:04:29,074 INFO hdfs.BucketWriter: Renaming s3n://flume-ng/flumedata/FlumeData.122585423857995.tmp to s3n://flume-ng/flumedata/FlumeData.122585423857995 2012-05-25 05:04:29,097 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,120 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857995_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,203 WARN httpclient.RestS3Service: Response '/flumedata' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,224 WARN httpclient.RestS3Service: Response '/flumedata_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,608 INFO hdfs.BucketWriter: Creating s3n://flume-ng/flumedata/FlumeData.122585423857996.tmp 2012-05-25 05:04:29,720 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,748 WARN httpclient.RestS3Service: Response '/flumedata%2FFlumeData.122585423857996.tmp_%24folder%24' - Unexpected response code 404, expected 200 2012-05-25 05:04:29,791 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' writing to tempfile '/tmp/hadoop-root/s3/output-2477068572058013384.tmp' 2012-05-25 05:04:29,793 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' closed. Now beginning upload 2012-05-25 05:04:29,828 INFO s3native.NativeS3FileSystem: OutputStream for key 'flumedata/FlumeData.122585423857996.tmp' upload complete Any help in this regard is highly appreciated.
          Arvind Prabhakar made changes -
          Fix Version/s notrack [ 12320245 ]
          Fix Version/s v1.1.0 [ 12319284 ]
          Alexander Alten-Lorenz made changes -
          Assignee Alexander Lorenz-Alten [ alo.alt ]
          Alexander Alten-Lorenz made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Alexander Alten-Lorenz added a comment -

          flume uses hdfs abstraction, we don't need a separate configuration for. We add a notice in the upcoming guide about.

          Show
          Alexander Alten-Lorenz added a comment - flume uses hdfs abstraction, we don't need a separate configuration for. We add a notice in the upcoming guide about.
          Hide
          Alexander Alten-Lorenz added a comment -

          You're right. I'll add a line into the guide and close the jira.

          Show
          Alexander Alten-Lorenz added a comment - You're right. I'll add a line into the guide and close the jira.
          Hide
          E. Sammer added a comment -

          I'm confused. Hadoop's FileSystem abstraction supports writing to S3. Maybe we don't expose direct configuration for it, but it was never a separate sink in 0.9. Should we not use Hadoop's implementation (I'm fine with that)?

          Show
          E. Sammer added a comment - I'm confused. Hadoop's FileSystem abstraction supports writing to S3. Maybe we don't expose direct configuration for it, but it was never a separate sink in 0.9. Should we not use Hadoop's implementation (I'm fine with that)?
          Alexander Alten-Lorenz created issue -

            People

            • Assignee:
              Alexander Alten-Lorenz
              Reporter:
              Alexander Alten-Lorenz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development