Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi .
ETL path: flink upsert-kafka connector -> hudi table (MOR table,query by stream)
Here is the case:
1. the first time: write two records with the same primary key into kafka, and insert them into hudi table. the query result should be three records: +I first record, -U first record, +U second record; But the first time I query hudi table, I found that all the data operation were +I: +I first record,+I first record and +I second record, and there was no update operation;
Three times +I has affected hudi's subsequent ETL process-the data of groupBy is inaccurate;
2. Second time: Exit the first query, restart the query job of hudi table, and the query results are normal: +I first data, -U first data, +U second data.
Reason:
Reason:There is a bug in the program. When no data log file is generated, the Schema does not include the column' _ hoodie _ operation'.Please refer to the following link for details:
https://www.jianshu.com/p/29f9ec5e606e