Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4575

Initial Kafka Global Offsets in Hudi Kafka Sink Connector

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.1.0
    • kafka-connect
    • None

    Description

      Hi team,
      I am trying to run Hudi Sink Connector with Kafka Connect. When the connectors starts, it starts the Transaction coordinator which initialises the global committed  offsets from the Hudi commit file. When its a first time run, there is no commit file and hence it outputs
      [2022-08-08 19:58:20,529] INFO Hoodie Extra Metadata from latest commit is absent (org.apache.hudi.connect.writers.KafkaConnectTransactionServices:147)
      But say in first time, the earliest kafka offset is not 0, then the process keeps on running the commit timelines. Ideally, the global offsets, at first run, should be set to the earliest kafka offset.
      As per the current implementation, the participant checks the local offset with coordinator offset and when its a mismatch, it sets to 0. But this breaks, when its a fresh run and the global kafka commited offset is not 0

      Attachments

        Activity

          People

            Unassigned Unassigned
            vishalag Vishal Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: