Kafka
  1. Kafka
  2. KAFKA-713

Update Hadoop producer for Kafka 0.8 changes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      With the changes in Kafka 0.8, the Hadoop producer (in contrib) is busted due to changes in the way KeyedMessages are now handled. I will fix.

        Activity

        Sam Shah created issue -
        Hide
        Sam Shah added a comment -

        Here's a patch for the following changes to the Hadoop producer:

        • Now works in 0.8 (fixed the KeyedMessage wrapping)
        • Adds support for semantic partitioning
        • Provides a much cleaner way of specifying configuration to Kafka
        • Removes legacy ZK-based producer from the docs
        • Removes one byte buffer copy on each message publish so is infinitesimally faster
        • Updates the build dependencies to use Pig 0.10
        Show
        Sam Shah added a comment - Here's a patch for the following changes to the Hadoop producer: Now works in 0.8 (fixed the KeyedMessage wrapping) Adds support for semantic partitioning Provides a much cleaner way of specifying configuration to Kafka Removes legacy ZK-based producer from the docs Removes one byte buffer copy on each message publish so is infinitesimally faster Updates the build dependencies to use Pig 0.10
        Sam Shah made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.8 [ 12317244 ]
        Labels hadoop
        Fix Version/s 0.8.1 [ 12322960 ]
        Sam Shah made changes -
        Attachment KAFKA-713.patch [ 12565583 ]
        Hide
        Neha Narkhede added a comment -

        Sam, thanks for the patch. I tried applying it on a fresh checkout of 0.8 and also on trunk, but it fails applying it to KafkaOutputFormat -
        Hunk #4 FAILED at 99.
        1 out of 4 hunks FAILED – saving rejects to file contrib/hadoop-producer/src/main/java/kafka/bridge/hadoop/KafkaOutputFormat.java.rej

        I also tried merging the changes, but there are many changes to this file and I'm not sure it would result in the set of changes you intended to make. Could you please rebase ?

        Show
        Neha Narkhede added a comment - Sam, thanks for the patch. I tried applying it on a fresh checkout of 0.8 and also on trunk, but it fails applying it to KafkaOutputFormat - Hunk #4 FAILED at 99. 1 out of 4 hunks FAILED – saving rejects to file contrib/hadoop-producer/src/main/java/kafka/bridge/hadoop/KafkaOutputFormat.java.rej I also tried merging the changes, but there are many changes to this file and I'm not sure it would result in the set of changes you intended to make. Could you please rebase ?
        Sam Shah made changes -
        Attachment KAFKA-713.patch [ 12565583 ]
        Hide
        Sam Shah added a comment -

        Oops, here's the patch rebased to trunk.

        Show
        Sam Shah added a comment - Oops, here's the patch rebased to trunk.
        Sam Shah made changes -
        Attachment KAFKA-713.patch [ 12565663 ]
        Hide
        Neha Narkhede added a comment -

        Thanks for the rebased patch, Sam. A few minor review suggestions -

        1. KafkaOutputFormat

        This will be a good time to add a few useful defaults to the Hadoop producer -
        1.1 Remove max.message.size since it is obsolete in 0.8 as message size checks moved to the server
        1.2 buffer.size is now send.buffer.bytes
        1.3 Add compression.codec (0 = no compression, 1 = GZIP, 2 = Snappy). The Kafka producer defaults to no compression, but for Hadoop->Kafka pushes, compression will be useful to have. Should probably default to gzip/snappy

        2. README
        REGISTER zkclient-20120522.jar; is no longer required

        Show
        Neha Narkhede added a comment - Thanks for the rebased patch, Sam. A few minor review suggestions - 1. KafkaOutputFormat This will be a good time to add a few useful defaults to the Hadoop producer - 1.1 Remove max.message.size since it is obsolete in 0.8 as message size checks moved to the server 1.2 buffer.size is now send.buffer.bytes 1.3 Add compression.codec (0 = no compression, 1 = GZIP, 2 = Snappy). The Kafka producer defaults to no compression, but for Hadoop->Kafka pushes, compression will be useful to have. Should probably default to gzip/snappy 2. README REGISTER zkclient-20120522.jar; is no longer required
        Sam Shah made changes -
        Attachment KAFKA-713.patch [ 12565663 ]
        Hide
        Sam Shah added a comment -

        Updated patch with Neha's suggestions. I default to GZip compression with the Hadoop producer.

        Show
        Sam Shah added a comment - Updated patch with Neha's suggestions. I default to GZip compression with the Hadoop producer.
        Sam Shah made changes -
        Attachment KAFKA-713.patch [ 12567181 ]
        Hide
        Neha Narkhede added a comment -

        +1

        Show
        Neha Narkhede added a comment - +1
        Hide
        Neha Narkhede added a comment -

        Thanks for the updated patch, Sam. Just checked it in.

        Show
        Neha Narkhede added a comment - Thanks for the updated patch, Sam. Just checked it in.
        Neha Narkhede made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.8 [ 12317244 ]
        Fix Version/s 0.8.1 [ 12322960 ]
        Resolution Fixed [ 1 ]
        Neha Narkhede made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Tony Stevenson made changes -
        Workflow no-reopen-closed, patch-avail [ 12746766 ] Apache Kafka Workflow [ 13052945 ]
        Tony Stevenson made changes -
        Workflow Apache Kafka Workflow [ 13052945 ] no-reopen-closed, patch-avail [ 13055516 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        6m 56s 1 Sam Shah 18/Jan/13 23:53
        Patch Available Patch Available Resolved Resolved
        13d 18h 53m 1 Neha Narkhede 01/Feb/13 18:47
        Resolved Resolved Closed Closed
        4m 54s 1 Neha Narkhede 01/Feb/13 18:52

          People

          • Assignee:
            Sam Shah
            Reporter:
            Sam Shah
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development