Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-642

Protocol tweaks for 0.8

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.8.0
    • None
    • core
    • None

    Description

      There are a couple of things in the protocol that are not idea. It would be good to tweak these for 0.8 so we start clean.

      Here is a set of problems and proposals:

      Problems:
      1. Correlation id is not used across all the requests. I don't think it can work as intended because of this.
      2. On reflection I am not sure that we need a correlation id field. I think that since we need to guarantee that processing is sequential on any particular socket we can correlate with a simple queue. (e.g. as the client sends messages it adds them to a queue and as it receives responses it just correlates to whatever is at the head of the queue).
      3. The metadata response seems to have a number of problems. Among them is that it weirdly repeats all the broker information many times. The response includes the ISR, leader (maybe), and the replicas. Each of these repeat all the broker information. This is super weird. I think what we should be doing here is including all broker information for all brokers and then just having the appropriate ids for the isr, leader, and replicas.
      4. For topic discovery I think we need to support the case where no topics are specified in the metadata request and for this return information about all topics. I don't think we do this now.
      5. I don't understand what the creator id is.
      6. The offset request and response is not fully thought through and should be generalized.

      Proposals:
      1, 2. Correlation id. This is not strictly speaking needed, but it is maybe useful for debugging to be able to trace a particular request from client to server. So we will extend this across all the requests.
      3. For metadata response I will try to fix this up by normalizing out the broker list and having the isr, replicas, and leader field just have the node id.
      4. This should be uncontroversial and easy to add.
      5. Let's remove creator id, it isn't used.
      6. Let's generalize offset request. My proposal is below:

      Rename TopicMetadata API to ClusterMetadata, as this will contain all the data that is known cluster-wide. Then let's generalize the offset request to be PartitionMetadata--namely stuff about a particular partition on a particular server.

      The format of PartitionMetdata would be the following:

      PartitionMetadataRequest => [TopicName [PartitionId MinSegmentTime MaxSegmentInfos]]
      TopicName => string
      PartitionId => uint32
      MinSegmentTime => uint64
      MaxSegmentInfos => int32

      PartitionMetadataResponse => [TopicName [PartitionMetadata]]
      TopicName => string
      PartitionMetadata => PartitionId LogSize NumberOfSegments LogEndOffset HighwaterMark [SegmentData]
      SegmentData => StartOffset LastModifiedTime
      LogSize => uint64
      NumberOfSegments => int32
      LogEndOffset => int64
      HighwaterMark => int64

      This would be general enough that we could continue to add to it for any new pieces of data we need.

      Attachments

        1. KAFKA-642-v6.patch
          60 kB
          Jay Kreps
        2. KAFKA-642-v4.patch
          65 kB
          Jay Kreps
        3. KAFKA-642-v3.patch
          58 kB
          Jay Kreps
        4. KAFKA-642-v2.patch
          58 kB
          Jay Kreps
        5. KAFKA-642-v1.patch
          35 kB
          Jay Kreps
        6. KAFKA-642-remove-response-versions.patch
          26 kB
          Jay Kreps

        Issue Links

          Activity

            People

              jkreps Jay Kreps
              jkreps Jay Kreps
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: