Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-964

Improve the performance of the continuous OFFSET checkpointing for logged stores

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.1
    • Component/s: None
    • Labels:
      None

      Description

      SAMZA-905 added the capability to write the OFFSET file on every commit().

      Unfortunately, the performance was a hindrance for one of our larger jobs at LinkedIn. The job has 10 stores, each with hundreds of partitions in their changelog topics. The performance problem came from KafkaSystemAdmin.getSystemStreamMetadata() method which:
      1. Periodically refetches the topic metadata
      2. Always fetches offsets twice (oldest,upcoming) for every partition

      Calling this method to fetch the offsets for just a couple tasks is wasteful. Metadata should only be fetched if there's a problem. Doing it periodically doesn't help. The total number of offset fetches is S*2*T^2 where S is the number of stores and P is the number of tasks/changelog partitions. Since we only need the newest offset should require S*T offset requests. Ideally, we'd also parallelize these requests, but that will be an exercise for another time.

      The fix has 3 components:
      1. Cache metadata more aggressively. Only expire metadata if we get Kafka NotLeaderForPartitionException
      2. Reduce excessive Offset fetching.
      3. Do not allow unbounded exponential backoff for offset checkpointing, just skip the offset file. Exponential backoff can balloon the commit time and stall the event loop. So we will only retry up to 3 times for a max delay of 400ms

      1. SAMZA-964_1.patch
        33 kB
        Jake Maes
      2. SAMZA-964_2.patch
        34 kB
        Jake Maes
      3. SAMZA-964_3.patch
        34 kB
        Jake Maes
      4. SAMZA-964_4.patch
        34 kB
        Jake Maes
      5. SAMZA-964_5.patch
        34 kB
        Jake Maes

        Activity

        Show
        jmakes Jake Maes added a comment - https://reviews.apache.org/r/48459/
        Hide
        navina Navina Ramesh added a comment -

        +1 Committed. Thanks!

        Show
        navina Navina Ramesh added a comment - +1 Committed. Thanks!

          People

          • Assignee:
            jmakes Jake Maes
            Reporter:
            jmakes Jake Maes
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development