Kafka
  1. Kafka
  2. KAFKA-1498

new producer performance and bug improvements

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      We have seen the following issues with the new producer.

      1. The producer request can be significantly larger than the configured batch size.
      2. The bottleneck in mirrormaker when there are keyed messages and compression is turned on.
      3. The selector is woken up on every message in the new producer.

      1. kafka-1498.patch
        33 kB
        Jun Rao
      2. KAFKA-1498.patch
        35 kB
        Guozhang Wang
      3. KAFKA-1498_2014-06-25_16:44:51.patch
        36 kB
        Guozhang Wang
      4. KAFKA-1498_2014-06-30_10:47:17.patch
        41 kB
        Guozhang Wang
      5. KAFKA-1498_2014-06-30_15:47:56.patch
        41 kB
        Guozhang Wang
      6. KAFKA-1498_2014-07-01_11:12:41.patch
        41 kB
        Guozhang Wang
      7. KAFKA-1498.patch
        0.9 kB
        Guozhang Wang

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        75d 1h 34m 1 Guozhang Wang 03/Sep/14 00:05
        Guozhang Wang made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Guozhang Wang added a comment -

        Created reviewboard https://reviews.apache.org/r/23215/
        against branch origin/trunk

        Show
        Guozhang Wang added a comment - Created reviewboard https://reviews.apache.org/r/23215/ against branch origin/trunk
        Guozhang Wang made changes -
        Attachment KAFKA-1498.patch [ 12653474 ]
        Hide
        Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/22874/
        against branch origin/trunk

        Show
        Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/22874/ against branch origin/trunk
        Guozhang Wang made changes -
        Attachment KAFKA-1498_2014-07-01_11:12:41.patch [ 12653432 ]
        Hide
        Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/22874/
        against branch origin/trunk

        Show
        Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/22874/ against branch origin/trunk
        Guozhang Wang made changes -
        Attachment KAFKA-1498_2014-06-30_15:47:56.patch [ 12653257 ]
        Hide
        Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/22874/
        against branch origin/trunk

        Show
        Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/22874/ against branch origin/trunk
        Guozhang Wang made changes -
        Attachment KAFKA-1498_2014-06-30_10:47:17.patch [ 12653189 ]
        Hide
        Guozhang Wang added a comment -

        Updated reviewboard https://reviews.apache.org/r/22874/
        against branch origin/trunk

        Show
        Guozhang Wang added a comment - Updated reviewboard https://reviews.apache.org/r/22874/ against branch origin/trunk
        Guozhang Wang made changes -
        Attachment KAFKA-1498_2014-06-25_16:44:51.patch [ 12652522 ]
        Hide
        Guozhang Wang added a comment -

        Created reviewboard https://reviews.apache.org/r/22874/
        against branch origin/trunk

        Show
        Guozhang Wang added a comment - Created reviewboard https://reviews.apache.org/r/22874/ against branch origin/trunk
        Guozhang Wang made changes -
        Attachment KAFKA-1498.patch [ 12652011 ]
        Jun Rao made changes -
        Field Original Value New Value
        Attachment kafka-1498.patch [ 12651500 ]
        Hide
        Jun Rao added a comment - - edited

        Attach a draft patch of revision dcc88408c98a07cb9a816ab55cd81e55f1d2217d on Jun. 10.

        Included in the patch:
        1. Address the issue that a batch in the producer request can be significantly larger than the configured batch size.
        This is done by patching MemoryRecords.hasRoom() and MemoryRecords.isFull().
        2. Address the bottleneck in mirrormaker when there are keyed messages and compression is turned on.
        Use a data channel per producer thread.
        3. Address the issue that the selector is woken up on every message in the new producer. This is the trickiest part. The fix is the following.
        (a) In KafkaProducer.send(), only wake up the selector if the batch becomes full during append.
        (b) In Metadata.fetch(), force the selector to wake up if metadata is not available.
        (c) In sender, calculate the select time dynamically in each iteration of the selector.poll() call. The select time is the minimal of the remaining linger time of all partitions and the metadata request. The select time is bounded by linger time. This is to handle the case that the selector is doing a long poll, a new messages is produced and no new messages come afterwards. We need to make sure that the message can be processed within the linger time.

        This cover the following cases well.
        3.1. If linger time is larger and there are lots of messages, the selector won't be woke up too frequently.
        3.2. If linger time is small and there are lots of messages, the selector will be busy. However, this is expected.

        This doesn't deal with the following case well.
        3.3 If linger time is small and there are very few messages, the selector will still wake up every linger time. Not sure what's the best way to deal with this. One thing that I was thinking is to have a min_linger threshold. The selector will use a select time at least of min_linger, say 5ms, if there is nothing to do. In KafkaProducer.send(), if linger is configured to be larger than min_linger, wake up the selector on every message. This way, the selector will only be busy if there are lots of messages.

        Not sure that I have thought through other potential timing issues.

        4. Added a few missing ingraphs.

        Other todos:
        1. Metadata.needsUpdate should be renamed properly.
        2. Methods with new parameters need new comments accordingly.
        3. Metrics
        3.1 In addition to record-size-max, add record-size-avg.
        3.2 Rename incoming-bytes-rate and outgoing-bytes-rate to network-in-bytes-rate and network-out-bytes-rate

        Show
        Jun Rao added a comment - - edited Attach a draft patch of revision dcc88408c98a07cb9a816ab55cd81e55f1d2217d on Jun. 10. Included in the patch: 1. Address the issue that a batch in the producer request can be significantly larger than the configured batch size. This is done by patching MemoryRecords.hasRoom() and MemoryRecords.isFull(). 2. Address the bottleneck in mirrormaker when there are keyed messages and compression is turned on. Use a data channel per producer thread. 3. Address the issue that the selector is woken up on every message in the new producer. This is the trickiest part. The fix is the following. (a) In KafkaProducer.send(), only wake up the selector if the batch becomes full during append. (b) In Metadata.fetch(), force the selector to wake up if metadata is not available. (c) In sender, calculate the select time dynamically in each iteration of the selector.poll() call. The select time is the minimal of the remaining linger time of all partitions and the metadata request. The select time is bounded by linger time. This is to handle the case that the selector is doing a long poll, a new messages is produced and no new messages come afterwards. We need to make sure that the message can be processed within the linger time. This cover the following cases well. 3.1. If linger time is larger and there are lots of messages, the selector won't be woke up too frequently. 3.2. If linger time is small and there are lots of messages, the selector will be busy. However, this is expected. This doesn't deal with the following case well. 3.3 If linger time is small and there are very few messages, the selector will still wake up every linger time. Not sure what's the best way to deal with this. One thing that I was thinking is to have a min_linger threshold. The selector will use a select time at least of min_linger, say 5ms, if there is nothing to do. In KafkaProducer.send(), if linger is configured to be larger than min_linger, wake up the selector on every message. This way, the selector will only be busy if there are lots of messages. Not sure that I have thought through other potential timing issues. 4. Added a few missing ingraphs. Other todos: 1. Metadata.needsUpdate should be renamed properly. 2. Methods with new parameters need new comments accordingly. 3. Metrics 3.1 In addition to record-size-max, add record-size-avg. 3.2 Rename incoming-bytes-rate and outgoing-bytes-rate to network-in-bytes-rate and network-out-bytes-rate
        Jun Rao created issue -

          People

          • Assignee:
            Guozhang Wang
            Reporter:
            Jun Rao
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development