Kafka
  1. Kafka
  2. KAFKA-196

Topic creation fails on large values

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      Since topic logs are stored in a directory holding the topic's name, creation of the directory might fail for large strings.
      This is not a problem per-se but the exception thrown is rather cryptic and hard to figure out for operations.

      I propose fixing this temporarily with a hard limit of 200 chars for topic names, it would also be possible to hash the topic name.

      Another concern is that the exception raised stops the broker, effectively creating a simple DoS vector, I'm concerned about how tests or wrong client library usage can take down the whole broker.

        Activity

        Hide
        Taylor Gautier added a comment -

        There are a lot of things that do this. In particular lots of different characters can cause problems but especially filesystem ones like '/' and ' ' etc.

        Show
        Taylor Gautier added a comment - There are a lot of things that do this. In particular lots of different characters can cause problems but especially filesystem ones like '/' and ' ' etc.
        Hide
        Pierre-Yves Ritschard added a comment -

        Yep, the thing is, I think the message makes it hard to figure out what just happened, at lest I thought so.

        Show
        Pierre-Yves Ritschard added a comment - Yep, the thing is, I think the message makes it hard to figure out what just happened, at lest I thought so.
        Hide
        Jun Rao added a comment -

        Thanks for the patch. Where is the limit 200 coming from? Is that the file name limit in most file systems?

        Show
        Jun Rao added a comment - Thanks for the patch. Where is the limit 200 coming from? Is that the file name limit in most file systems?
        Hide
        Pierre-Yves Ritschard added a comment -

        I tried coming up with a sensible and not limitative number which will
        not clash with MAXPATHLEN when prefixing with the log directory. Some
        old systems are rumored to have it as low as 256, but I don't see many
        use cases where very large topic strings are relevant. It could safely
        be bumped to 1000 (which would leave 23 chars for the prefix path).

        Java has no way of accessing MAXPATHLEN unfortunately, since it is OS-specific

        On Mon, Nov 7, 2011 at 5:12 PM, Jun Rao (Commented) (JIRA)

        Show
        Pierre-Yves Ritschard added a comment - I tried coming up with a sensible and not limitative number which will not clash with MAXPATHLEN when prefixing with the log directory. Some old systems are rumored to have it as low as 256, but I don't see many use cases where very large topic strings are relevant. It could safely be bumped to 1000 (which would leave 23 chars for the prefix path). Java has no way of accessing MAXPATHLEN unfortunately, since it is OS-specific On Mon, Nov 7, 2011 at 5:12 PM, Jun Rao (Commented) (JIRA)
        Hide
        Jun Rao added a comment -

        Do you think we can make it configurable and default it to 200? Also, can we create a method like verifyTopicName and put all the checkings there?

        Show
        Jun Rao added a comment - Do you think we can make it configurable and default it to 200? Also, can we create a method like verifyTopicName and put all the checkings there?
        Hide
        Pierre-Yves Ritschard added a comment -

        You're taking me away from one-liner territory, but i'll take a look
        at it. as for configurable, a property in server.properties is enough
        ?

        On Mon, Nov 7, 2011 at 5:28 PM, Jun Rao (Commented) (JIRA)

        Show
        Pierre-Yves Ritschard added a comment - You're taking me away from one-liner territory, but i'll take a look at it. as for configurable, a property in server.properties is enough ? On Mon, Nov 7, 2011 at 5:28 PM, Jun Rao (Commented) (JIRA)
        Hide
        Jun Rao added a comment -

        Yes, a new server property should work.

        Show
        Jun Rao added a comment - Yes, a new server property should work.
        Hide
        Taylor Gautier added a comment -

        Not sure if you will try to address my comment, but verifyTopicName wouldn't suffice - topic names should either be encoded to protect against special characters causing problems, or hashed as Ritschard suggests.

        Show
        Taylor Gautier added a comment - Not sure if you will try to address my comment, but verifyTopicName wouldn't suffice - topic names should either be encoded to protect against special characters causing problems, or hashed as Ritschard suggests.
        Hide
        Jun Rao added a comment -

        Yes, we can use verifyTopicName to capture all constraints on topic names. We probably don't want to make it too complicated. How big is the character list that we should disallow?

        Show
        Jun Rao added a comment - Yes, we can use verifyTopicName to capture all constraints on topic names. We probably don't want to make it too complicated. How big is the character list that we should disallow?
        Hide
        Jun Rao added a comment -

        Not sure why the server would hang when it couldn't create a log directory. Our socket server processors capture all throwable. So this shouldn't kill any of the processors. Could you take a thread dump and see why the server hangs?

        Show
        Jun Rao added a comment - Not sure why the server would hang when it couldn't create a log directory. Our socket server processors capture all throwable. So this shouldn't kill any of the processors. Could you take a thread dump and see why the server hangs?
        Hide
        Jay Kreps added a comment -

        I think it would be good just to fully think through escaping and validating topic names. Currently we do essentially nothing, and we could potentially file infinite number of bugs against all the individual corner cases. If that is out of scope for what you are trying to do Pierre-Yves we can open a separate ticket, but I think we need to define the acceptable set of strings in a topic name and check that. For example, should we allow spaces? slashes? semicolons? etc. We need to do this escaping against both the unix fs and zookeeper. We are super permissive now, which leads to all kinds of corner cases. The larger solution might just be to make a KafkaTopic class that has the validation logic in the constructor and includes an escapedForFs() and escapedForZk() methods. We probably won't use this everywhere up front, but at least it begins to centralize the logic and makes it easy to reason if a name has already been escaped or not.

        Show
        Jay Kreps added a comment - I think it would be good just to fully think through escaping and validating topic names. Currently we do essentially nothing, and we could potentially file infinite number of bugs against all the individual corner cases. If that is out of scope for what you are trying to do Pierre-Yves we can open a separate ticket, but I think we need to define the acceptable set of strings in a topic name and check that. For example, should we allow spaces? slashes? semicolons? etc. We need to do this escaping against both the unix fs and zookeeper. We are super permissive now, which leads to all kinds of corner cases. The larger solution might just be to make a KafkaTopic class that has the validation logic in the constructor and includes an escapedForFs() and escapedForZk() methods. We probably won't use this everywhere up front, but at least it begins to centralize the logic and makes it easy to reason if a name has already been escaped or not.
        Hide
        Chris Burroughs added a comment -

        '/' is particularly complicated since we want to eventually have support for hierarchical topics, in which case '/' (or whatever we choose) will have special meaning to us, ZK, and the local filesystem. I'd also prefer to have one way to represent topics as strings and not have separate ZK and local fs escaping schemes.

        That said, unless Pierre-Yves feels like biting off a big patch lets keep this one for a configurable max topic length so that the problem users are running into now is fixed.

        Show
        Chris Burroughs added a comment - '/' is particularly complicated since we want to eventually have support for hierarchical topics, in which case '/' (or whatever we choose) will have special meaning to us, ZK, and the local filesystem. I'd also prefer to have one way to represent topics as strings and not have separate ZK and local fs escaping schemes. That said, unless Pierre-Yves feels like biting off a big patch lets keep this one for a configurable max topic length so that the problem users are running into now is fixed.

          People

          • Assignee:
            Unassigned
            Reporter:
            Pierre-Yves Ritschard
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development