Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-15678

[Tiered Storage] Stall remote reads with long-spanning transactions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.6.0
    • None
    • Tiered-Storage

    Description

      I am facing an issue on the remote data path for uncommitted reads.

      As mentioned in the original PR, if a transaction spans over a long sequence of segments, the time taken to retrieve the producer snapshots from the remote storage can, in the worst case, become redhibitory and block the reads if it consistently exceed the deadline of fetch requests (fetch.max.wait.ms).

      Essentially, the method used to compute the uncommitted records to return with the fetch response has an asymptotic complexity proportional to the number of segments in the log. This is not a problem with local storage since the constant factor to traverse the producer snapshot files is small enough, but that is not the case with a remote storage which exhibits higher read latency.

      An aggravating factor was the lock contention in the remote index cache which was mitigated by KAFKA-15084 since then. But unfortunately, despite the improvements observed without the said contention, the algorithmic complexity of the current method used to compute uncommitted records can always defeat any optimisation made on the remote read path.

      Maybe we could start thinking (if not already) about a different construct which would reduce that complexity to O(1) - i.e. to make the computation independent from the number of segments and irrespective of the spans of transactions.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adupriez Alexandre Dupriez
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: