Kafka
  1. Kafka
  2. KAFKA-713

Update Hadoop producer for Kafka 0.8 changes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      With the changes in Kafka 0.8, the Hadoop producer (in contrib) is busted due to changes in the way KeyedMessages are now handled. I will fix.

        Activity

        Hide
        Sam Shah added a comment -

        Here's a patch for the following changes to the Hadoop producer:

        • Now works in 0.8 (fixed the KeyedMessage wrapping)
        • Adds support for semantic partitioning
        • Provides a much cleaner way of specifying configuration to Kafka
        • Removes legacy ZK-based producer from the docs
        • Removes one byte buffer copy on each message publish so is infinitesimally faster
        • Updates the build dependencies to use Pig 0.10
        Show
        Sam Shah added a comment - Here's a patch for the following changes to the Hadoop producer: Now works in 0.8 (fixed the KeyedMessage wrapping) Adds support for semantic partitioning Provides a much cleaner way of specifying configuration to Kafka Removes legacy ZK-based producer from the docs Removes one byte buffer copy on each message publish so is infinitesimally faster Updates the build dependencies to use Pig 0.10
        Hide
        Neha Narkhede added a comment -

        Sam, thanks for the patch. I tried applying it on a fresh checkout of 0.8 and also on trunk, but it fails applying it to KafkaOutputFormat -
        Hunk #4 FAILED at 99.
        1 out of 4 hunks FAILED – saving rejects to file contrib/hadoop-producer/src/main/java/kafka/bridge/hadoop/KafkaOutputFormat.java.rej

        I also tried merging the changes, but there are many changes to this file and I'm not sure it would result in the set of changes you intended to make. Could you please rebase ?

        Show
        Neha Narkhede added a comment - Sam, thanks for the patch. I tried applying it on a fresh checkout of 0.8 and also on trunk, but it fails applying it to KafkaOutputFormat - Hunk #4 FAILED at 99. 1 out of 4 hunks FAILED – saving rejects to file contrib/hadoop-producer/src/main/java/kafka/bridge/hadoop/KafkaOutputFormat.java.rej I also tried merging the changes, but there are many changes to this file and I'm not sure it would result in the set of changes you intended to make. Could you please rebase ?
        Hide
        Sam Shah added a comment -

        Oops, here's the patch rebased to trunk.

        Show
        Sam Shah added a comment - Oops, here's the patch rebased to trunk.
        Hide
        Neha Narkhede added a comment -

        Thanks for the rebased patch, Sam. A few minor review suggestions -

        1. KafkaOutputFormat

        This will be a good time to add a few useful defaults to the Hadoop producer -
        1.1 Remove max.message.size since it is obsolete in 0.8 as message size checks moved to the server
        1.2 buffer.size is now send.buffer.bytes
        1.3 Add compression.codec (0 = no compression, 1 = GZIP, 2 = Snappy). The Kafka producer defaults to no compression, but for Hadoop->Kafka pushes, compression will be useful to have. Should probably default to gzip/snappy

        2. README
        REGISTER zkclient-20120522.jar; is no longer required

        Show
        Neha Narkhede added a comment - Thanks for the rebased patch, Sam. A few minor review suggestions - 1. KafkaOutputFormat This will be a good time to add a few useful defaults to the Hadoop producer - 1.1 Remove max.message.size since it is obsolete in 0.8 as message size checks moved to the server 1.2 buffer.size is now send.buffer.bytes 1.3 Add compression.codec (0 = no compression, 1 = GZIP, 2 = Snappy). The Kafka producer defaults to no compression, but for Hadoop->Kafka pushes, compression will be useful to have. Should probably default to gzip/snappy 2. README REGISTER zkclient-20120522.jar; is no longer required
        Hide
        Sam Shah added a comment -

        Updated patch with Neha's suggestions. I default to GZip compression with the Hadoop producer.

        Show
        Sam Shah added a comment - Updated patch with Neha's suggestions. I default to GZip compression with the Hadoop producer.
        Hide
        Neha Narkhede added a comment -

        +1

        Show
        Neha Narkhede added a comment - +1
        Hide
        Neha Narkhede added a comment -

        Thanks for the updated patch, Sam. Just checked it in.

        Show
        Neha Narkhede added a comment - Thanks for the updated patch, Sam. Just checked it in.

          People

          • Assignee:
            Sam Shah
            Reporter:
            Sam Shah
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development