Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Issue Links

        Activity

        Hide
        Disabled imported user added a comment -

        I think jets3t (http://bitbucket.org/jmurty/jets3t/wiki/Home) is the preferred way to interact with S3 from Java, but I could be wrong. jclouds is also nice (http://bitbucket.org/jmurty/jets3t/wiki/Home), but might be too much for just storing data to S3.

        Show
        Disabled imported user added a comment - I think jets3t ( http://bitbucket.org/jmurty/jets3t/wiki/Home ) is the preferred way to interact with S3 from Java, but I could be wrong. jclouds is also nice ( http://bitbucket.org/jmurty/jets3t/wiki/Home ), but might be too much for just storing data to S3.
        Hide
        Jonathan Hsieh added a comment -

        I've gotten this working at a hackathon in LA probably two months ago. Haven't done much testing to find out its limitations. Basically, need to add some jars (jets3t and some others – need to check licenses), add some s3 keys to config, and then you can just use s3n://bucket/file or s3://bucket/file with the collectorSink.

        Show
        Jonathan Hsieh added a comment - I've gotten this working at a hackathon in LA probably two months ago. Haven't done much testing to find out its limitations. Basically, need to add some jars (jets3t and some others – need to check licenses), add some s3 keys to config, and then you can just use s3n://bucket/file or s3://bucket/file with the collectorSink.
        Hide
        Disabled imported user added a comment -

        After chatting with Jon offline, it appears that we could write to S3 with zero code changes in Flume via the HDFS integration outlined at http://wiki.apache.org/hadoop/AmazonS3 by simply including some jar files and setting some configuration variables.

        The jars required:

        • commons-codec-1.3.jar
        • commons-httpclient-3.0.1.jar
        • jets3t-0.6.1.jar

        The configuration variables that need to change are indicated on the wiki page linked above; essentially you need to tell Flume how to authenticate with AWS.

        Once the jar files and configuration variables are set, just use collectorSink("s3n://my-bucket/my-dir", "my-file-name-prefix").

        I'm going to test this one now...

        Show
        Disabled imported user added a comment - After chatting with Jon offline, it appears that we could write to S3 with zero code changes in Flume via the HDFS integration outlined at http://wiki.apache.org/hadoop/AmazonS3 by simply including some jar files and setting some configuration variables. The jars required: commons-codec-1.3.jar commons-httpclient-3.0.1.jar jets3t-0.6.1.jar The configuration variables that need to change are indicated on the wiki page linked above; essentially you need to tell Flume how to authenticate with AWS. Once the jar files and configuration variables are set, just use collectorSink("s3n://my-bucket/my-dir", "my-file-name-prefix") . I'm going to test this one now...
        Show
        Disabled imported user added a comment - Also for convenience the jar file locations: http://www.ibiblio.org/maven/commons-codec/jars/commons-codec-1.3.jar http://www.ibiblio.org/maven/commons-httpclient/jars/commons-httpclient-3.0.1.jar http://repo1.maven.org/maven2/net/java/dev/jets3t/jets3t/0.6.1/jets3t-0.6.1.jar
        Hide
        Disabled imported user added a comment -

        Two caveats when using S3 as a sink:

        • Ensure your bucket name and folders don't use an underscore. S3Credentials.java tries to call URI.getHost() on the path you specify, and the host will return null if the URI contains an underscore.
        • Make sure your configuration key names match your file system scheme. That is, if you use s3n://, use the configuration key fs.s3n.awsAccessKeyId, and if you use s3://, use the configuration key fs.s3.awsAccessKeyId.
        Show
        Disabled imported user added a comment - Two caveats when using S3 as a sink: Ensure your bucket name and folders don't use an underscore. S3Credentials.java tries to call URI.getHost() on the path you specify, and the host will return null if the URI contains an underscore. Make sure your configuration key names match your file system scheme. That is, if you use s3n:// , use the configuration key fs.s3n.awsAccessKeyId , and if you use s3:// , use the configuration key fs.s3.awsAccessKeyId .
        Hide
        Disabled imported user added a comment -

        Also, by default the collectorSink rolls log files every 30 seconds: see flume.collector.roll.millis in flume-conf.xml.

        Show
        Disabled imported user added a comment - Also, by default the collectorSink rolls log files every 30 seconds: see flume.collector.roll.millis in flume-conf.xml .
        Hide
        Disabled imported user added a comment -

        ...and I can confirm that this works! Just add the three jar files listed above and you can have Flume write your logs into S3 for consumption later.

        Show
        Disabled imported user added a comment - ...and I can confirm that this works! Just add the three jar files listed above and you can have Flume write your logs into S3 for consumption later.
        Hide
        Jonathan Hsieh added a comment -

        Jeff,

        There is a small section on getting flume working with S3 in the Troubleshooting section of the user guide. Could you update it with some of the comments you made and maybe an example setup? Alternately we can move this make it a "recipe" in the cookbook. (FLUME-121)

        Thanks,
        Jon.

        Show
        Jonathan Hsieh added a comment - Jeff, There is a small section on getting flume working with S3 in the Troubleshooting section of the user guide. Could you update it with some of the comments you made and maybe an example setup? Alternately we can move this make it a "recipe" in the cookbook. ( FLUME-121 ) Thanks, Jon.
        Hide
        flume_depsypher added a comment -

        Sweet, this works nicely. It's described in the user guide under troubleshooting, but I got a little stuck because the http://wiki.apache.org/hadoop/AmazonS3 link above has some out of date info for hadoop-0.20. The config file to add your credentials to in that case is /etc/hadoop/conf/core-site.xml instead of /etc/hadoop/conf/hadoop-site.xml (which is deprecated).

        Show
        flume_depsypher added a comment - Sweet, this works nicely. It's described in the user guide under troubleshooting, but I got a little stuck because the http://wiki.apache.org/hadoop/AmazonS3 link above has some out of date info for hadoop-0.20. The config file to add your credentials to in that case is /etc/hadoop/conf/core-site.xml instead of /etc/hadoop/conf/hadoop-site.xml (which is deprecated).
        Hide
        Jonathan Hsieh added a comment -

        Ray,

        Would you mind spending some time updating the user guide with a small section that includes some instructions on how and what you needed to do to get s3 working happily? It would be really cool if it highlighted some of the gotchas and workarounds!

        Thanks,
        Jon.

        Show
        Jonathan Hsieh added a comment - Ray, Would you mind spending some time updating the user guide with a small section that includes some instructions on how and what you needed to do to get s3 working happily? It would be really cool if it highlighted some of the gotchas and workarounds! Thanks, Jon.
        Hide
        Jonathan Hsieh added a comment -

        When running on ec2 and writing to s3, pting suggests:

        14:08 <pting> EricL, oh geez... i hope it's not serious... ya, you need commons-httpclient, commons-codec and jets3t... and i would use the emr hadoop core jar if you don't have it already
        14:09 <pting> ... i just know the jar provided by cloudera isn't working with s3

        Show
        Jonathan Hsieh added a comment - When running on ec2 and writing to s3, pting suggests: 14:08 <pting> EricL, oh geez... i hope it's not serious... ya, you need commons-httpclient, commons-codec and jets3t... and i would use the emr hadoop core jar if you don't have it already 14:09 <pting> ... i just know the jar provided by cloudera isn't working with s3
        Hide
        Disabled imported user added a comment - - edited

        The jar pting is referring to is switching the hadoop-core.jar to the emr-hadoop-core.jar. This was necessary. My issue ended up being that I put the commons-codec on the agent and not the collector and no error or warning was thrown when trying to write to s3n. I redid everything from scratch and realized I was missing the commons-codec from the collector. Once I added that, everything started to work. I will write a post on my steps and attach a link to this ticket when complete.

        UPDATE: http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/

        Show
        Disabled imported user added a comment - - edited The jar pting is referring to is switching the hadoop-core.jar to the emr-hadoop-core.jar. This was necessary. My issue ended up being that I put the commons-codec on the agent and not the collector and no error or warning was thrown when trying to write to s3n. I redid everything from scratch and realized I was missing the commons-codec from the collector. Once I added that, everything started to work. I will write a post on my steps and attach a link to this ticket when complete. UPDATE: http://eric.lubow.org/2011/system-administration/distributed-flume-setup-with-an-s3-sink/
        Hide
        Jonathan Hsieh added a comment -

        I think Eric Lubow's recent blog post nicely explains how to do get this functionality. I'm going to resolved this as "Works for me" in a few days (or tweak the jira to add "Workaround" resolution) unless I hear anything that suggests otherwise.

        Show
        Jonathan Hsieh added a comment - I think Eric Lubow's recent blog post nicely explains how to do get this functionality. I'm going to resolved this as "Works for me" in a few days (or tweak the jira to add "Workaround" resolution) unless I hear anything that suggests otherwise.
        Hide
        Jonathan Hsieh added a comment -

        Closing issue out because there are several blog posts about this now and because j3tset and other required jars are now pulled in by maven.

        Show
        Jonathan Hsieh added a comment - Closing issue out because there are several blog posts about this now and because j3tset and other required jars are now pulled in by maven.

          People

          • Assignee:
            Disabled imported user
            Reporter:
            Disabled imported user
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development