Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7994

Improve Partition-Time for rebalances and restarts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0
    • streams
    • None

    Description

      We compute a per-partition partition-time as the maximum timestamp over all records processed so far. Before 2.3 this was used to determine the logical stream-time used to make decisions about processing out-of-order records or drop them if they are late (ie, timestamp < stream-time - grace-period). Preserving the stream-time is necessary to ensure deterministic results (see KAFKA-9368), and although the processor-time is now used instead of partition-time, preserving the partition-time is a first step towards improving the overall stream-time semantics.

      The partition-time is also used by the TimestampExtractor. It gets passed in to #extract and can be used to determine a rough timestamp estimate if the actual timestamp is missing, corrupt, etc. This means in the corner case where the next record to be processed after a rebalance/restart cannot have its actual timestamp determined, we have no idea way of coming up with a reasonable guess and the record will likely have to be dropped.

       

      A potential fix would be, to store latest observed partition-time in the metadata of committed offsets. This way, on restart/rebalance we can re-initialize partition-time correctly.

      Attachments

        1. possible-patch.diff
          5 kB
          Richard Yu

        Issue Links

          Activity

            People

              Yohan123 Richard Yu
              mjsax Matthias J. Sax
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: