Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-1670

When fetching a newest offset for a partition, also prefetch and cache the newest offsets for other partitions on the container

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:

      Description

      ExtendedSystemAdmin.getNewestOffset current just works on one system-stream-partition at a time. As an optimization, when one system-stream-partition needs a newest offset, a batch call can be leveraged to also fetch newest offsets (and cache the data) for other partitions on the same container.

      This can help to reduce the call volume to system admins to get newest offset metadata. This can also help reduce contention on system admins when metadata is needed by multiple threads at the same time.

      Proposed approach:

      Add a new getNewestOffset API to StreamMetadataCache. Have the cache keep track of all system-stream-partitions that have asked for newest offsets before, and when a system-stream-partition needs newest offset metadata, check if there are any other stale entries and fetch those as well. This also requires adding a getNewestOffsets batch call to ExtendedSystemAdmin. The benefit here is that StreamMetadataCache is already reused by multiple tasks, but the disadvantage is that it has to keep track of new state.

      Alternative approach:

      Collect all system-stream-partitions that will need newest offset metadata at setup, and then make the batch call whenever any of those partitions needs metadata and cache the metadata. The benefit for this approach is that no state needs to be built up, as it is known at setup, but it might be unclean to do the initial collection and keep track of it. For example, it might be necessary to store container-granular information inside partition-granular objects (e.g. TaskStorageManager).

        Attachments

          Activity

            People

            • Assignee:
              cameronlee314 Cameron Lee
              Reporter:
              cameronlee314 Cameron Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: