Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5155

hive reading rt table will get duplicate record

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.11.0
    • 0.13.0
    • hive
    • None

    Description

      hive read mor rt table, will get duplicated record in below case:

      1. using bucket index type
      2. say primary key 1 - 100,  set bucket number to 1
      3. insert 1 - 100 record ,compact it , one parquet file will be generated
      4. insert 1 - 100 record once again, but dont't compact it, so the data file will contain 1 parquet file + 1 log file.
      5. select * from table where key=1,  you will get 2 record.

      the cause is  :

        in HoodieMergeOnReadtableInputFormat ,the isSplitable will return true, it will generate two map task, each task include the log file,  so each task will return one record.

      pls refer this:

      https://github.com/apache/hudi/issues/4618

      Attachments

        Activity

          People

            Unassigned Unassigned
            wenli wangwenli
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: