Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Hi team,
I am trying to run Hudi Sink Connector with Kafka Connect. When the connectors starts, it starts the Transaction coordinator which initialises the global committed offsets from the Hudi commit file. When its a first time run, there is no commit file and hence it outputs
[2022-08-08 19:58:20,529] INFO Hoodie Extra Metadata from latest commit is absent (org.apache.hudi.connect.writers.KafkaConnectTransactionServices:147)
But say in first time, the earliest kafka offset is not 0, then the process keeps on running the commit timelines. Ideally, the global offsets, at first run, should be set to the earliest kafka offset.
As per the current implementation, the participant checks the local offset with coordinator offset and when its a mismatch, it sets to 0. But this breaks, when its a fresh run and the global kafka commited offset is not 0