Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-964

Improve the performance of the continuous OFFSET checkpointing for logged stores

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.1
    • Component/s: None
    • Labels:
      None

      Description

      SAMZA-905 added the capability to write the OFFSET file on every commit().

      Unfortunately, the performance was a hindrance for one of our larger jobs at LinkedIn. The job has 10 stores, each with hundreds of partitions in their changelog topics. The performance problem came from KafkaSystemAdmin.getSystemStreamMetadata() method which:
      1. Periodically refetches the topic metadata
      2. Always fetches offsets twice (oldest,upcoming) for every partition

      Calling this method to fetch the offsets for just a couple tasks is wasteful. Metadata should only be fetched if there's a problem. Doing it periodically doesn't help. The total number of offset fetches is S*2*T^2 where S is the number of stores and P is the number of tasks/changelog partitions. Since we only need the newest offset should require S*T offset requests. Ideally, we'd also parallelize these requests, but that will be an exercise for another time.

      The fix has 3 components:
      1. Cache metadata more aggressively. Only expire metadata if we get Kafka NotLeaderForPartitionException
      2. Reduce excessive Offset fetching.
      3. Do not allow unbounded exponential backoff for offset checkpointing, just skip the offset file. Exponential backoff can balloon the commit time and stall the event loop. So we will only retry up to 3 times for a max delay of 400ms

        Attachments

        1. SAMZA-964_5.patch
          34 kB
          Jake Maes
        2. SAMZA-964_4.patch
          34 kB
          Jake Maes
        3. SAMZA-964_3.patch
          34 kB
          Jake Maes
        4. SAMZA-964_2.patch
          34 kB
          Jake Maes
        5. SAMZA-964_1.patch
          33 kB
          Jake Maes

          Activity

            People

            • Assignee:
              jmakes Jake Maes
              Reporter:
              jmakes Jake Maes
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: