Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
So I am trying to use a S3 sink using hdfs but I am running into hurdles at every corner. My situation is that I need to be able to push to s3 without using access/secret amazon keys and using the underlying instance profile to authenticate with s3. I also need to add the aws encryption header for AES256. I am trying to use the base path of `s3://something.us-east-2.something/else`, but when I try it I get a `<Error><Code>AuthorizationHeaderMalformed</Code><Message>The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-east-2'</Message><Region>us-east-2</Region><RequestId>N/A</RequestId><HostId>N/A</HostId></Error>`
Here is my flume config:
```
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.zookeeperConnect = localhost:2181
tier1.sources.source1.topic = lynch
- tier1.sources.source1.groupId = flume
tier1.sources.source1.channels = channel1
tier1.sources.source1.interceptors = i1
tier1.sources.source1.interceptors.i1.type = timestamp
tier1.sources.source1.kafka.consumer.timeout.ms = 100
tier1.channels.channel1.type = memory
#tier1.channels.channel1.capacity = 10000
#tier1.channels.channel1.transactionCapacity = 1000
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = s3://something.us-east-2.something/else
tier1.sinks.sink1.hdfs.rollInterval = 5
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.channel = channel1
```
Here is the command to run it:
```
bin/flume-ng agent -c . -f kafka-source.conf -n tier1
```
It should not be this difficult to push to S3 and adding support for s3:// addresses and instance profiles needs to happen. I have tried many permutations to get this to work, and I really want to see flume become a more friendly tool in these situations.