Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2780

Mor reads the log file and skips the complete block as a bad block, resulting in data loss

    XMLWordPrintableJSON

Details

    • 0.5

    Description

      Check the data in the middle of the bad block through debug, and find that the lost data is in the offset of the bad block, but because of the eof skip during the reading, the compact merge cannot be written to the parquet at that time, but the deltacommit of the time is successful. There are two consecutive hudi magic in the middle of the bad block. Reading blocksize in the next digit actually reads the binary conversion of #HUDI# to 1227030528, which means that the eof exception is reported when the file size is exceeded.

      Detect the position of the next block and skip the bad block. It should not start from the position after reading the blocksize, but from the position before reading the blocksize

      Attachments

        Issue Links

          Activity

            People

              hj324545 jing
              hj324545 jing
              Alexey Kudinkin, sivabalan narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: