Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4119

the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11.1
    • None

    Description

       the first read result is incorrect  when Flink upsert- Kafka connector is used in  HUDi .
       
       ETL  path: flink upsert-kafka connector -> hudi table (MOR table,query by stream)
       
      Here is the case:
       
      1. the first time: write two records  with the same primary key into kafka, and  insert them into hudi table. the query result should be three records: +I first record, -U first record, +U second record; But the first time I query hudi table, I found that all the data operation were +I: +I first record,+I first record and +I second record, and there was no update operation; 
       Three times +I has affected hudi's subsequent ETL process-the data of  groupBy is inaccurate; 
      2. Second time: Exit the first query, restart the query job of hudi table, and the query results are normal: +I first data, -U first data, +U second data.
       
      Reason:
      Reason:There is a bug in the program. When no data log file is generated, the Schema does not include the column' _ hoodie _ operation'.Please refer to the following link for details:
      https://www.jianshu.com/p/29f9ec5e606e

      Attachments

        Activity

          People

            Unassigned Unassigned
            aliceyyan yanxiang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: