Kafka
  1. Kafka
  2. KAFKA-689

Can't append to a topic/partition that does not already exist

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: clients
    • Labels:
      None

      Description

      With a totally fresh Kafka (empty logs dir and empty ZK), if I send a ProduceRequest for a new topic, Kafka responds with "kafka.common.UnknownTopicOrPartitionException: Topic test partition 0 doesn't exist on 0". This is when sending a ProduceRequest over the network (from Python, in this case).

      If I use the console producer it works fine (topic and partition get created). If I then send the same payload from before over the network, it works.

      1. produce-payload.bin
        0.1 kB
        David Arthur
      2. kafka.log
        53 kB
        David Arthur

        Activity

        David Arthur created issue -
        Hide
        David Arthur added a comment -

        Attaching a sample payload generated by my Python client as well as some Kafka logs

        Show
        David Arthur added a comment - Attaching a sample payload generated by my Python client as well as some Kafka logs
        David Arthur made changes -
        Field Original Value New Value
        Attachment produce-payload.bin [ 12563835 ]
        Attachment kafka.log [ 12563836 ]
        Hide
        David Arthur added a comment -

        Here is the same payload base64 encoded:

        AAAAWwAAAAAAAAAAAAxrYWZrYS1weXRob24AAQAAA+gAAAABAAR0ZXN0AAAAAQAAAAAAAAApAAAA
        AAAAAAAAAAAdemDkywIAAAAAA2ZvbwAAAAx0ZXN0IG1lc3NhZ2U=

        Show
        David Arthur added a comment - Here is the same payload base64 encoded: AAAAWwAAAAAAAAAAAAxrYWZrYS1weXRob24AAQAAA+gAAAABAAR0ZXN0AAAAAQAAAAAAAAApAAAA AAAAAAAAAAAdemDkywIAAAAAA2ZvbwAAAAx0ZXN0IG1lc3NhZ2U=
        Hide
        Jun Rao added a comment -

        The auto topic creation logic on the broker is only triggered on getMetadataRequest. So, you will need to either create the topic manually or issue a getMetadataRequest. In general, your Python producer will need the metadata any way to be able to send the data to the right broker (ie, leader of a partition).

        Show
        Jun Rao added a comment - The auto topic creation logic on the broker is only triggered on getMetadataRequest. So, you will need to either create the topic manually or issue a getMetadataRequest. In general, your Python producer will need the metadata any way to be able to send the data to the right broker (ie, leader of a partition).
        Hide
        David Arthur added a comment -

        Thanks, I was hoping it was something simple like this. Feel free to
        invalidate this bug

        -David

        Show
        David Arthur added a comment - Thanks, I was hoping it was something simple like this. Feel free to invalidate this bug -David
        Jun Rao made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Jun Rao made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Jay Kreps added a comment -

        This is pretty hacky though, no? fetching metadata should not create topics--that is like a getter subtly changing values underneith you. I think this is more evidence for needing to expose a proper create_topic api.

        Show
        Jay Kreps added a comment - This is pretty hacky though, no? fetching metadata should not create topics--that is like a getter subtly changing values underneith you. I think this is more evidence for needing to expose a proper create_topic api.
        Hide
        David Arthur added a comment -

        Could the metadata API be modified with an "auto-create" flag?

        Show
        David Arthur added a comment - Could the metadata API be modified with an "auto-create" flag?
        Hide
        Jay Kreps added a comment -

        Well I guess what I am saying is that getting metadata is not intuitively at all related to creating topics. I had noticed this code before but hadn't really thought about it. I assume the reason for this is because to make a correct produce request you have to know the host so the old strategy of doing auto-create on produce doesn't work in 0.8.

        I think there are two sensible strategies for auto-create:
        1. Auto create on produce. This is tricky because you have to somehow ensure that the local node would hold the partitions used (and how did the client come up with those partitions anyway?)
        2. Add a public api for creating topics and make the client implement auto create client-side

        I would favor (2).

        There is no harm in the current scheme as long as people are warned that we intend to change it.

        Show
        Jay Kreps added a comment - Well I guess what I am saying is that getting metadata is not intuitively at all related to creating topics. I had noticed this code before but hadn't really thought about it. I assume the reason for this is because to make a correct produce request you have to know the host so the old strategy of doing auto-create on produce doesn't work in 0.8. I think there are two sensible strategies for auto-create: 1. Auto create on produce. This is tricky because you have to somehow ensure that the local node would hold the partitions used (and how did the client come up with those partitions anyway?) 2. Add a public api for creating topics and make the client implement auto create client-side I would favor (2). There is no harm in the current scheme as long as people are warned that we intend to change it.
        Hide
        David Arthur added a comment -

        I don't disagree really, but what I'm saying is: why not piggyback topic creation onto the metadata API?

        Instead of:

        MetadataRequest => Topic
        Topic => Name [Topic]

        you could have:

        MetadataRequest => Topic
        Topic => Name CreateIfNotExist [Topic]

        It would be a lot less code than a whole new API.

        Show
        David Arthur added a comment - I don't disagree really, but what I'm saying is: why not piggyback topic creation onto the metadata API? Instead of: MetadataRequest => Topic Topic => Name [Topic] you could have: MetadataRequest => Topic Topic => Name CreateIfNotExist [Topic] It would be a lot less code than a whole new API.
        Hide
        Jay Kreps added a comment -

        Well I think there are actually two problems.

        The first is that it is mixing concerns in kind of a messy way. In some sense it would be the "get metadata or maybe create a new topic" api. So I would rather not enshrine that in the public API. I think it is okay the kind of hacky way it is now, and in the future we can add an api.

        The second problem is that we need to move the per-topic config into zookeeper so that you can dynamically add topics with their own settings (flush interval, retention period, # of partitions, etc...there are like ten of these settings). This is discussed in KAFKA-554. Currently these are in the broker config, but that means bouncing the broker all the time which is a hassle. So we will need to include these in the api that creates topics. That makes it messier still since we would have all these properties mixed into the get_metadata call.

        I agree that right now it is a lot of overhead to add an api, but I think we could fix that directly by making it easier to add apis. KAFKA-643 had one proposal for htis.
        Since you just went through that it would be good to get your feedback on that proposal.

        Show
        Jay Kreps added a comment - Well I think there are actually two problems. The first is that it is mixing concerns in kind of a messy way. In some sense it would be the "get metadata or maybe create a new topic" api. So I would rather not enshrine that in the public API. I think it is okay the kind of hacky way it is now, and in the future we can add an api. The second problem is that we need to move the per-topic config into zookeeper so that you can dynamically add topics with their own settings (flush interval, retention period, # of partitions, etc...there are like ten of these settings). This is discussed in KAFKA-554 . Currently these are in the broker config, but that means bouncing the broker all the time which is a hassle. So we will need to include these in the api that creates topics. That makes it messier still since we would have all these properties mixed into the get_metadata call. I agree that right now it is a lot of overhead to add an api, but I think we could fix that directly by making it easier to add apis. KAFKA-643 had one proposal for htis. Since you just went through that it would be good to get your feedback on that proposal.
        Hide
        Jun Rao added a comment -

        Currently, auto-creation is a broker-side flag. Basically, the broker controls whether a topic can be created automatically or not. This is likely useful for admin. The getMetadata API implicitly implies auto-creation, subject to the server side config. This is probably a bit hacky. It does save one extra RPC. We can think a bit more if adding a separate create topic API is a better strategy.

        Show
        Jun Rao added a comment - Currently, auto-creation is a broker-side flag. Basically, the broker controls whether a topic can be created automatically or not. This is likely useful for admin. The getMetadata API implicitly implies auto-creation, subject to the server side config. This is probably a bit hacky. It does save one extra RPC. We can think a bit more if adding a separate create topic API is a better strategy.
        Hide
        ben fleis added a comment -

        Although it's not precisely the same, perhaps thinking about topic|partition as a remote file open is a useful metaphor. An open() call is where you would set normal open params (flush interval, O_CREAT, etc.), and stat() is where you get broker and other real time updates. Of course, if create is explicit, where does delete come into play?

        @Jun - I don't see where in server.properties anything about topic creation exists? And further, does an extra RPC matter if it's only during setup/periodic?

        Show
        ben fleis added a comment - Although it's not precisely the same, perhaps thinking about topic|partition as a remote file open is a useful metaphor. An open() call is where you would set normal open params (flush interval, O_CREAT, etc.), and stat() is where you get broker and other real time updates. Of course, if create is explicit, where does delete come into play? @Jun - I don't see where in server.properties anything about topic creation exists? And further, does an extra RPC matter if it's only during setup/periodic?
        Hide
        Jun Rao added a comment -

        In KafkaConfig, we have a property auto.create.topics. We probably need to keep this feature so that an admin can choose to only allow topics created through admin tools.

        The extra RPC is not a big deal.

        Show
        Jun Rao added a comment - In KafkaConfig, we have a property auto.create.topics. We probably need to keep this feature so that an admin can choose to only allow topics created through admin tools. The extra RPC is not a big deal.

          People

          • Assignee:
            Unassigned
            Reporter:
            David Arthur
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development