Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12370

Refactor KafkaStreams exposed metadata hierarchy

    XMLWordPrintableJSON

Details

    Description

      Currently in KafkaStreams we have two groups of metadata getter:

      1.

      allMetadata
      allMetadataForStore
      

      Return collection of StreamsMetadata, which only contains the partitions as active/standby, plus the hostInfo, but not exposing any task info.

      2.

      queryMetadataForKey
      

      Returns KeyQueryMetadata that includes the hostInfos of active and standbys, plus the partition id.

      3.

      localThreadsMetadata
      

      Returns ThreadMetadata, that includes a collection of TaskMetadata for active and standby tasks.

      All the above functions are used for interactive queries, but their exposed metadata are very different, and some use cases would need to have all client, thread, and task metadata to fulfill the feature development. At the same time, we may have a more dynamic "task -> thread" mapping in the future and also the embedded clients like consumers would not be per thread, but per client.

      ---------------

      Rethinking about the metadata, I feel we can have a more consistent hierarchy as the following:

      • StreamsMetadata represent the metadata for the client, which includes the set of ThreadMetadata for its existing thread and the set of TaskMetadata for active and standby tasks assigned to this client, plus client metadata including hostInfo, embedded client ids.
      • ThreadMetadata includes name, state, the set of TaskMetadata for currently assigned tasks. Also after we removed the deprecated EOSv1, it should always return a single producer client id since each thread would only have one client.
      • TaskMetadata includes the name (including the sub-topology id and the partition id), the state, the corresponding sub-topology description (including the state store names, source topic names).
      • allMetadata, allMetadataForStore, allMetadataForKey (renamed from queryMetadataForKey) returns the set of StreamsMetadata, and localMetadata (renamed from localThreadMetadata) returns a single StreamsMetadata.
      • KeyQueryMetadata Class would be deprecated and replaced by TaskMetadata.

      To illustrate as an example, to find out who are the current active host / standby hosts of a specific store, we would call allMetadataForStore, and for each returned StreamsMetadata we loop over their contained TaskMetadata for active / standby, and filter by its corresponding sub-topology's description's contained store name.

      Attachments

        Activity

          People

            jlprat Josep Prat
            guozhang Guozhang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: