Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-292

Consume more entries from kafka than specified sourceLimit.

    XMLWordPrintableJSON

Details

    Description

      When CheckpointUtils#computeOffsetRanges for consuming kafka messges.

      Given
      topic = "test",
      fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 0), (4 -> 0),
      toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000),
      numEvents = 1001.

      The output of CheckpointUtils#computesOffsetRanges is

      OffsetRange(topic: 'test', partition: 0, range: [0 -> 100])
      OffsetRange(topic: 'test', partition: 1, range: [0 -> 226])
      OffsetRange(topic: 'test', partition: 2, range: [0 -> 226])
      OffsetRange(topic: 'test', partition: 3, range: [0 -> 226])
      OffsetRange(topic: 'test', partition: 4, range: [0 -> 226])

      Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more entries from kafka than specified 1001.

      CC vinoth

      Attachments

        Issue Links

          Activity

            People

              xleesf leesf
              xleesf leesf
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m