Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3279

Metadata table stores incorrect file sizes after Restore

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Blocker
    • Resolution: Duplicate
    • None
    • 0.11.0
    • None
    • None
    • 2

    Description

      While working on https://github.com/apache/hudi/pull/4556, I have stumbled upon an issue of the LogBlock Scanner EOF-ing on the log-files in tests after performing Restore operation.

      The root-cause of these turned out to be Metadata Table storing incorrect sizes of the files after Restore (sizes in MT are essentially 2x of what is in FS):

       

      This seems to occur due to following: 

      1. Metadata table treats new Records for the same file as "deltas", appending the file-size to its records (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java#L227)]
      2. Upon Restore (which is handled simply as a collection of Rollbacks) we pick max of the sizes of the files before and after the operation, not regarding to which we're actually rolling back to (https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L254).]

       

      Proposal

      Instead of simply always picking the max size, we should pick the size of the file as it was right before.

       

      Attachments

        1. Screen Shot 2022-01-19 at 7.56.37 PM.png
          139 kB
          Alexey Kudinkin
        2. Screen Shot 2022-01-19 at 12.18.27 PM.png
          199 kB
          Alexey Kudinkin
        3. Screen Shot 2022-01-19 at 12.17.21 PM.png
          316 kB
          Alexey Kudinkin

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: