Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-292

Consume more entries from kafka than specified sourceLimit.

    XMLWordPrintableJSON

    Details

      Description

      When CheckpointUtils#computeOffsetRanges for consuming kafka messges.

      Given
      topic = "test",
      fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 0), (4 -> 0),
      toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000),
      numEvents = 1001.

      The output of CheckpointUtils#computesOffsetRanges is

      OffsetRange(topic: 'test', partition: 0, range: [0 -> 100])
      OffsetRange(topic: 'test', partition: 1, range: [0 -> 226])
      OffsetRange(topic: 'test', partition: 2, range: [0 -> 226])
      OffsetRange(topic: 'test', partition: 3, range: [0 -> 226])
      OffsetRange(topic: 'test', partition: 4, range: [0 -> 226])

      Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more entries from kafka than specified 1001.

      CC Vinoth Chandar

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                xleesf leesf
                Reporter:
                xleesf leesf
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m