Thanks for the patch Sam ! Overall, it looks great. I have a few comments/questions -
1. In the example README, it will be good to change " REGISTER zookeeper-3.3.3.jar;" to " REGISTER zookeeper-3.3.4.jar;". The reason being that there are known critical bugs with 3.3.3 and it will save users who might copy paste the example script as is
2. As a default option, is there any particular reason for using the sync producer ? Using the async producer option provides significant improvement in the network bandwidth utilization.
I have a few more questions/comments about the Hadoop-Kafka bridge, feel free to file another JIRA if you'd rather have it fixed later.
1. It seems like when the broker.list option is selected, only one broker can be specified. This is true if that broker is pointing to a VIP/hardware load balancer, but if not, then the broker.list is a csv of broker_id:broker_host:broker_port. It will be good to support that here.
2. If the kafka.output.producer.type=async, there are a few config options that should be supported. They are listed in AsyncProducerConfigShared
3. Does it make sense to also let the user specify a custom Partitioner as part of the partitioner.class config ? If one is not specified, it defaults to kafka.producer.DefaultPartitioner.