Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8717 Reader feature standardization - Phase 0
  3. HUDI-8654

Support correct merging results with record positions in log blocks generated during pending compaction

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.0.1
    • None

    Description

      When there is a pending compaction, the new base files to be generated by compaction is not available during this transaction. Given the log files in MOR from this transaction can be attached to the base file generated by the compaction in the latest file slice, the accurate record positions may not be derived.  However, the log files written in later delta commits after completed compaction have accurate positions.

      Similarly, for NBCC, the compaction can be schedule during an inflight deltacommit, and in this case the log file generated by the inflight deltacommit is associated with the new base file from the compaction, which may have different positions because of deletes.

      We need to make sure that the file group reader with position-based merging generates the correct results in such mix of log blocks.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yihua Y Ethan Guo
            yihua Y Ethan Guo
            sivabalan narayanan

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 20h
                20h
                Remaining:
                Time Spent - 4h Remaining Estimate - 16h
                16h
                Logged:
                Time Spent - 4h Remaining Estimate - 16h
                4h

                Agile

                  Active Sprint:
                  Hudi 1.0.1 Sprint #2 (Jan) ends 14/Jan/25
                  Completed Sprint:
                  Hudi 1.0.1 Sprint #1 ended 07/Jan/25
                  View on Board

                  Slack

                    Issue deployment