Kafka
  1. Kafka
  2. KAFKA-2948

Kafka producer does not cope well with topic deletions

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0.0
    • Fix Version/s: 0.10.1.0
    • Component/s: producer
    • Labels:
      None

      Description

      Kafka producer gets metadata for topics when send is invoked and thereafter it attempts to keep the metadata up-to-date without any explicit requests from the client. This works well in static environments, but when topics are added or deleted, list of topics in Metadata grows but never shrinks. Apart from being a memory leak, this results in constant requests for metadata for deleted topics.

      We are running into this issue with the Confluent REST server where topic deletion from tests are filling up logs with warnings about unknown topics. Auto-create is turned off in our Kafka cluster.

      I am happy to provide a fix, but am not sure what the right fix is. Does it make sense to remove topics from the metadata list when UNKNOWN_TOPIC_OR_PARTITION response is received if there are no outstanding sends? It doesn't look very straightforward to do this, so any alternative suggestions are welcome.

        Issue Links

          Activity

          Hide
          Jiangjie Qin added a comment -

          Rajini Sivaram As you pointed out we never remove the topics set in producer metadata. I am not sure if removing the topic from the set when we see UNKNOWN_TOPIC_OR_PARTITION error code is the right way to fix this, because UNKNOWN_TOPIC_OR_PARTITION can also occur in other cases such as partition reassignment, the producer is supposed to retry in this case.

          Maybe a TTL is a better solution here. e.g. If the producer hasn't sent data to a particular topic since last metadata refresh, we can remove the topic from metadata topic set on next metadata refresh.

          Show
          Jiangjie Qin added a comment - Rajini Sivaram As you pointed out we never remove the topics set in producer metadata. I am not sure if removing the topic from the set when we see UNKNOWN_TOPIC_OR_PARTITION error code is the right way to fix this, because UNKNOWN_TOPIC_OR_PARTITION can also occur in other cases such as partition reassignment, the producer is supposed to retry in this case. Maybe a TTL is a better solution here. e.g. If the producer hasn't sent data to a particular topic since last metadata refresh, we can remove the topic from metadata topic set on next metadata refresh.
          Hide
          Rajini Sivaram added a comment -

          Jiangjie Qin Thank you for your feedback. The fix that we are testing at the moment removes topics with `UNKNOWN_TOPIC_OR_PARTITION` error from the metadata set when the error is received in a response, but re-adds it when metadata is requested for the topic (eg. producer waiting for metadata to send a message). This ensures that the request is retried when required, but not when the topic is no longer in use.

          TTL sounds like a better option to remove not just deleted topics, but also any topic that is no longer being used. My only concern is that deleted topics would remain in the list for a longer period of time with a lot of warnings in the logs as metadata requests are retried. I could combine the current fix and TTL if required to avoid this, but I will try out TTL on its own first with the REST service and see how that goes.

          Show
          Rajini Sivaram added a comment - Jiangjie Qin Thank you for your feedback. The fix that we are testing at the moment removes topics with `UNKNOWN_TOPIC_OR_PARTITION` error from the metadata set when the error is received in a response, but re-adds it when metadata is requested for the topic (eg. producer waiting for metadata to send a message). This ensures that the request is retried when required, but not when the topic is no longer in use. TTL sounds like a better option to remove not just deleted topics, but also any topic that is no longer being used. My only concern is that deleted topics would remain in the list for a longer period of time with a lot of warnings in the logs as metadata requests are retried. I could combine the current fix and TTL if required to avoid this, but I will try out TTL on its own first with the REST service and see how that goes.
          Hide
          Mayuresh Gharat added a comment -

          Adding TTL would mean another user exposed config. Can we not use the number of times we got "UNKNOWN_TOPIC_OR_PARTITION" and then get rid of the topic.

          Show
          Mayuresh Gharat added a comment - Adding TTL would mean another user exposed config. Can we not use the number of times we got "UNKNOWN_TOPIC_OR_PARTITION" and then get rid of the topic.
          Hide
          Jiangjie Qin added a comment -

          Mayuresh Gharat I think TTL should not be a config but simply an internal mechanism. User should not care about this at all.

          Show
          Jiangjie Qin added a comment - Mayuresh Gharat I think TTL should not be a config but simply an internal mechanism. User should not care about this at all.
          Hide
          Rajini Sivaram added a comment -

          Mayuresh Gharat Jiangjie Qin The code that I am testing at the moment uses the current config `metadata.max.age.ms`. If no messages are sent to a topic for this interval, then the topic is removed from the metadata set. Subsequent send will add it back to the set. I am also marking the topic for delete if a send fails because no metadata was available for a topic, to limit the number of retries for deleted topics. Will submit a PR later today for review.

          Show
          Rajini Sivaram added a comment - Mayuresh Gharat Jiangjie Qin The code that I am testing at the moment uses the current config `metadata.max.age.ms`. If no messages are sent to a topic for this interval, then the topic is removed from the metadata set. Subsequent send will add it back to the set. I am also marking the topic for delete if a send fails because no metadata was available for a topic, to limit the number of retries for deleted topics. Will submit a PR later today for review.
          Hide
          ASF GitHub Bot added a comment -

          GitHub user rajinisivaram opened a pull request:

          https://github.com/apache/kafka/pull/645

          KAFKA-2948: Remove unused topics from producer metadata set

          If no messages are sent to a topic during the last refresh interval or if UNKNOWN_TOPIC_OR_PARTITION error is received, remove the topic from the metadata list. Topics are added to the list on the next attempt to send a message to the topic.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/rajinisivaram/kafka KAFKA-2948

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/645.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #645


          commit f7e40e5ce515d700e8cc7ab02a0f16141fa14f67
          Author: rsivaram <rsivaram@uk.ibm.com>
          Date: 2015-12-09T00:16:18Z

          KAFKA-2948: Remove unused topics from producer metadata set


          Show
          ASF GitHub Bot added a comment - GitHub user rajinisivaram opened a pull request: https://github.com/apache/kafka/pull/645 KAFKA-2948 : Remove unused topics from producer metadata set If no messages are sent to a topic during the last refresh interval or if UNKNOWN_TOPIC_OR_PARTITION error is received, remove the topic from the metadata list. Topics are added to the list on the next attempt to send a message to the topic. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajinisivaram/kafka KAFKA-2948 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/645.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #645 commit f7e40e5ce515d700e8cc7ab02a0f16141fa14f67 Author: rsivaram <rsivaram@uk.ibm.com> Date: 2015-12-09T00:16:18Z KAFKA-2948 : Remove unused topics from producer metadata set
          Hide
          Guozhang Wang added a comment -

          Assigning to Ewen Cheslack-Postava to review.

          Show
          Guozhang Wang added a comment - Assigning to Ewen Cheslack-Postava to review.
          Hide
          Ismael Juma added a comment - - edited

          Setting target as 0.9.1.0 as 0.9.0.1 will be released very soon and we want to be careful about last-minute regressions.

          Show
          Ismael Juma added a comment - - edited Setting target as 0.9.1.0 as 0.9.0.1 will be released very soon and we want to be careful about last-minute regressions.
          Hide
          ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/kafka/pull/645

          Show
          ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/645

            People

            • Assignee:
              Rajini Sivaram
              Reporter:
              Rajini Sivaram
              Reviewer:
              Ewen Cheslack-Postava
            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development